巴西专利BR112019013645A2 multi-type tree structure for video encoding

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
a method of video decoding comprising: receiving an encoded block of video data, determining a transform for the encoded block of video data, where the transform has a size s that is not a power of two, rounding s to a power of two creating a transform with a modified size s? , apply an inverse transform with the modified size s? to the encoded block of video data to create residual video data, and to decode the residual video data to create decoded block of video data.
公开号:BR112019013645A2
申请号:R112019013645
申请日:2018-01-05
公开日:2020-01-21
发明作者:Chuang Hsiao-Chiang；Chen Jianle；Zhang Li；Karczewicz Marta；Li Xiang；Zhao Xin
申请人:Qualcomm Inc；
IPC主号:

专利说明:

MULTI-TYPE TREE STRUCTURE FOR VIDEO CODING [0001] This disclosure claims the benefit of U.S. Provisional Application No. 62 / 443,569, filed on January 6, 2017, the entire content of which is incorporated by reference.
TECHNICAL FIELD [0002] This disclosure refers to video encoding and video decoding.
FUNDAMENTALS [0003] Digital video capabilities can be incorporated into a wide range of devices, including digital television shows, direct digital broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablets, video players e-book, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cell phones or via satellite radio, so-called smartphones, video conference devices, video streaming devices, and the like. Digital video devices implement video encoding techniques, such as those described in the standards defined by the MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264 / MPEG-4, Part 10 standards, Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), and extensions to such standards. Video devices can transmit, receive, encode, decode and / or store digital video information more efficiently by implementing such video encoding techniques.
[0004] Video encoding techniques
Petition 870190061136, of 7/1/2019, p. 8/139
2/107 include spatial prediction (intraimage) and / or temporal prediction (interimage) to reduce or remove the redundancy inherent in video sequences. For block-based video encoding, a video slice (for example, a video frame or a portion of a video frame) can be divided into video blocks, which can also be referred to as tree blocks (block in trees), encoding units (CUs) and / or encoding nodes. Images can be referred to as frames. Reference images can be referred to as reference frames.
[0005] Spatial or temporal prediction results in a predictive block for a block to be coded. Residual data represents pixel differences between the original block to be encoded and the predictive block. For additional compression, residual data can be transformed from the pixel domain to a transform domain, resulting in residual transform coefficients, which can then be quantized. Entropy coding can be applied to further achieve compression.
SUMMARY [0006] This disclosure describes techniques for splitting blocks of video data using a multi-type tree structure (MTT). The techniques of the present disclosure include determining a plurality of partitioning techniques at various nodes in a tree structure. Examples of the plurality of partitioning techniques may include partitioning techniques that symmetrically divide a block through the center of the block,
Petition 870190061136, of 7/1/2019, p. 9/139
3/107 as well as partitioning techniques that divide a block, symmetrically or asymmetrically, in such a way that the center of the block is not divided. In this way, the partitioning of video blocks can be performed in a way that leads to more efficient coding; including partitioning that best captures objects in the video data at the center of blocks.
[0007] This disclosure further describes techniques for applying transforms to partitioned blocks according to an MTT structure, techniques for generating and analyzing syntax elements indicating how the blocks are divided according to the MTT structure, techniques for block partitioning of luma and chroma according to an MTT structure, and techniques for encoding (that is, encoding and / or decoding) of partitioned blocks according to an MTT structure. The techniques described in this disclosure can be used individually, or together in any combination.
[0008] In an example of the invention, a method for decoding video data comprises receiving an encoded block of video data, determining a transform for the encoded block of video data, wherein the transform has an S size that is not a power of two, round S to a power of two by creating a reverse transform with a modified size S ', applying the reverse transform with a modified size S' to the encoded block of video data to create residual video data, and decoding the residual video data to create a decoded video data block.
[0009] In another example of the invention, a method
Petition 870190061136, of 7/1/2019, p. 10/139
4/107 video data encoding comprises receiving a video data block, predicting the video data block to create residual video data, determining a transform for residual video data, where the transform has a size S which is not a power of two, round S to a power of two creating a transform with a modified size S 'apply the transform with a modified size S' to the residual video data to create transform coefficients, and encode the transform coefficients in an encoded video bit stream.
[0010] In another example of the disclosure, a device configured to decode the video data comprises a memory configured to store the video data, and one or more processors in communication with the memory, the one or more processors configured to receive a encoded block of video data, determine a transform for the encoded block of video data, where the transform has a size S that is not a power of two, round S to a power of two creating a reverse transform with a modified size S ', apply the reverse transform with the modified size S' to the encoded block of the video data to create residual video data, and decode the residual video data to create a decoded block of the video data.
[0011] In another example disclosure, one device configured to encode the video data, O device comprising a memory configured for store the data of video, and one or more processors in
Petition 870190061136, of 7/1/2019, p. 11/139
5/107 communication with the memory, the one or more processors configured to receive a video data block, predict the video data block to create residual video data, determine a transform for the residual video data, in which the transform has a size S that is not a power of two, round S to a power of two creating a transform with a modified size S ', apply the transform with a modified size S' to the residual video data to create transform coefficients, and encoding the transform coefficients into an encoded video bit stream.
[0012] In another example of the disclosure, an apparatus configured to decode video data comprises means for receiving an encoded block of video data, means for determining a transform for the encoded block of video data, in which the transform has a size S that is not a power of two, means for rounding S to a power of two creating a reverse transform with a modified size S ', means for applying the reverse transform with a modified size S' to the encoded block of video data to create residual video data, and means for decoding residual video data to create a block of decoded video data.
[0013] In another example of the disclosure, an apparatus configured to encode the video data comprises means for receiving a video data block, means for predicting the video data block for creating residual video data, means for determining a transformed to residual video data, where the
Petition 870190061136, of 7/1/2019, p. 12/139
6/107 transform has a size S that is not a power of two, means to round S to a power of two by creating a transform with a modified size S ', means to apply the transform with the modified size S' to the video data residuals to create transform coefficients, and means to encode the transform coefficients in an encoded video bit stream.
[0014] In another example, this disclosure describes a computer-readable storage medium that stores instructions that, when executed, cause one or more processors of a device configured to decode video data to receive an encrypted block of data from video, determine a transform for the coded block of the video data, in which the transform has a size S that is not a power of two, round S to a power of two creating an inverse transform with a modified size S ', apply the inverse transformed with the modified size S 'to the encoded block of the video data to create residual video data, and decode the residual video data to create a decoded block of the video data.
[0015] In another example, this disclosure describes a computer-readable storage medium that stores instructions that, when executed, cause one or more processors of a device configured to encode video data to receive a block of video data , predict the video data block to create residual video data, determine a transform for residual video data, where the transform has a size S that is not a power of two, round S to
Petition 870190061136, of 7/1/2019, p. 13/139
7/107 a power of two creating a transform with a modified size S ', apply the transform with the modified size S' to the residual video data to create transform coefficients, and encode the transform coefficients in a video bit stream encoded.
[0016] Details of one or more examples are presented in the accompanying drawings and in the description below. Other characteristics, objects and advantages will be evident from the description, drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS [0017] Figure 1 is a block diagram that illustrates an exemplary video encoding and decoding system configured to implement the dissemination techniques.
[0018] Figure 2 is a conceptual diagram that illustrates the structure of the encoding unit (CU) in High Efficiency Video Coding (HEVC).
[0019] Figure 3 is a conceptual diagram that illustrates exemplary partition types for an interpretation mode.
[0020] Figure 4A is a conceptual diagram that illustrates an example of block partitioning using a quadtree binary tree structure (QTBT).
[0021] Figure 4B is a conceptual diagram that illustrates an example of a tree structure corresponding to the partitioning block using the QTBT structure of figure 4A.
[0022] Figure 5A is a conceptual diagram illustrating quadtree partitioning.
Petition 870190061136, of 7/1/2019, p. 14/139
8/107
[0023] The figure 5B is one diagram conceptual what illustrates partitioning vertical binary tree. [0024] The figure 5C is one diagram conceptual what illustrates partitioning horizontal binary tree. [0025] The figure 5D is one diagram conceptual what illustrates partitioning vertical center-lateral tree.[0026] The figure 5E is one diagram conceptual what illustrates partitioningintree center-side horizontal. [0027] The figure 6 is one diagram conceptual what
illustrates an example of encoding tree unit (CTU) partitioning according to the techniques of this disclosure.
[0028] Figure 7 is a conceptual diagram that illustrates exemplary asymmetric partitions according to an example of QTBT partitioning.
[0029] Figure 8 is a conceptual diagram that illustrates a dead zone scheme plus uniform quantization.
[0030] Figure 9 shows exemplary asymmetric partition types.
[0031] Figure 10 is a block diagram that illustrates an example of a video encoder.
[0032] Figure 11 is a block diagram that illustrates an example of a video decoder.
[0033] Figure 12 is a flow chart showing an exemplary method of encoding the disclosure.
[0034] Figure 13 is a flow chart showing an exemplary method of decoding the disclosure.
DETAILED DESCRIPTION [0035] This disclosure refers to
Petition 870190061136, of 7/1/2019, p. 15/139
9/107 partitioning and / or organizing blocks of video data (eg encoding units) in block-based video encoding. The techniques of the present disclosure can be applied to video encoding standards. In several examples described below, the techniques of this disclosure include blocks for partitioning video data, using three or more different partitioning structures. In some examples, three or more different partition structures can be used at each depth of a coding tree structure. Such partitioning techniques can be referred to as multi-type tree (MTT) partitioning. When using MTT partitioning, video data can be more flexibly partitioned, thus allowing greater coding efficiency.
[0036] This disclosure describes more techniques for applying transforms to partitioned blocks according to an MTT structure, techniques for generating and analyzing syntax elements indicating how the blocks are divided according to the MTT structure, techniques for block partitioning of luma and chroma according to an MTT structure, and techniques for encoding (that is, encoding and / or decoding) of blocks partitioned according to an MTT structure. The techniques described in this disclosure can be used individually, or together in any combination.
[0037] Figure 1 is a block diagram illustrating an exemplary video encoding and decoding system 10 that can use the techniques of this disclosure, to partition blocks of video data, signal and
Petition 870190061136, of 7/1/2019, p. 16/139
10/107 analyze partition types, and apply additional transforms and transform partitions. As shown in figure 1, system 10 includes a source device 12, which provides encoded video data to be decoded at a later time by a destination device 14. In particular, source device 12 provides video data for the target device 14 via a computer-readable medium 16. The source device 12 and the target device 14 can comprise any of a wide range of devices, including desktop computers, notebook computers (ie laptop) computers, tablets , set-top boxes, telephone devices, such as so-called smartphones, tablet computers, television shows, cameras, display devices, digital media players, video game consoles, video streaming devices, or similar. In some cases, source device 12 and destination device 14 may be equipped for wireless communication. Thus, the source device 12 and the destination device 14 can be wireless communication devices. The source device 12 is an exemplary video encoding device (i.e., a device for encoding video data). The target device 14 is an exemplary video decoding device (for example, a device or apparatus for decoding video data).
[0038] In the example of figure 1, the source device 12 includes a video source 18, a storage medium 20 configured to store video data, a video encoder 22, and an output interface 24. The
Petition 870190061136, of 7/1/2019, p. 17/139
11/107 destination device 14 includes an input interface 26, a storage medium 28 configured to store encoded video data, a video decoder 30, and a display device 32. In other examples, the source device 12 and the Target device 14 includes other components or arrangements. For example, the source device 12 can receive video data from an external video source, such as an external camera. Likewise, the target device 14 can interact with an external device, instead of including an integrated display device.
[0039] The illustrated system 10 of figure 1 is merely an example. Techniques for processing video data can be performed by any digital video encoding and / or decoding device or apparatus. Although the techniques of this disclosure are generally performed by a video encoding device and a video decoding device, the techniques can also be performed by a combined video encoder / decoder, typically referred to as a codec. Source device 12 and destination device 14 are merely examples of such encoding devices in which source device 12 generates encoded video data for transmission to destination device 14. In some examples, source device 12 and the target device 14 operates in a substantially symmetrical manner such that each of the source device 12 and target device 14 includes video encoding and decoding components. Consequently, system 10 can
Petition 870190061136, of 7/1/2019, p. 18/139
12/107 support unidirectional or bidirectional video transmission between the source device 12 and the destination device 14, for example, for video transmission, video playback, video broadcast or video telephony.
[0040] The video source 18 of the source device 12 may include a video capture device, such as a video camera, a video file containing previously captured video, and / or a video feed interface for receiving data video from a video content provider. As another alternative, video source 18 can generate computer-based data such as the source video, or a combination of live video, archived video and computer generated video. The source device 12 may comprise one or more data storage media (e.g., storage media 20) configured to store the video data. The techniques described in this disclosure may be applicable to video encoding, in general, and can be applied to wireless and / or wired applications. In each case, the video captured, pre-captured, or generated by the computer can be encoded by a video encoder 22. The output interface 24 can output the encoded video information to the computer-readable medium 16.
[0041] The target device 14 can receive the encoded video data to be decoded via computer-readable medium 16. The computer-readable medium 16 can comprise any type of medium or device capable of moving the encoded video data from the scanning device. source 12 to target device 14.
Petition 870190061136, of 7/1/2019, p. 19/139
10/13
In some examples, the computer-readable medium 16 includes a communication medium to allow the source device 12 to transmit encoded video data directly to the destination device 14 in real time. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the destination device 14. The communication medium can comprise any wired or wireless communication medium, such as such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from the source device 12 to the destination device 14. The destination device 14 may comprise one or more storage media configured to store encoded video data and decoded video data.
[0042] In some examples, encoded data (for example, encoded video data) can be output from the output interface 24 to a storage device. Likewise, encrypted data can be accessed from the storage device via the input interface 26. The storage device can include any of a variety of distributed or locally accessed data storage media, such as a disk drive. hard drive, Blu-ray discs, DVD, CD-ROMs, flash memory, volatile or non-volatile memory, server or
Petition 870190061136, of 7/1/2019, p. 20/139
14/107 any other digital storage medium suitable for storing encoded video data. In another example, the storage device can correspond to a file server or other intermediate storage device that can store the encoded video generated by the source device 12. The destination device 14 can access the video data stored on the storage device. streaming or downloading. The file server can be any type of server capable of storing video data and transmitting that encoded video data to the target device 14. Exemplary file servers include a web server (for example, for a website), a server FTP, networked storage devices (NAS), or a local hard drive. The target device 14 can access the encoded video data through any standard data connection, including an Internet connection. This can include a wireless channel (for example, a Wi-Fi connection), a wired connection (for example, DSL, a cable modem, etc.), or a combination of both, which is suitable for accessing data encoded video files stored on a file server. The transmission of encoded video data from the storage device can be a streaming transmission, a downloadable transmission, or a combination thereof.
[0043] The techniques of this disclosure can be applied to video encoding in support of any of a variety of multimedia applications, such as broadcast television over the air, cable television broadcasts, satellite television broadcasts, broadcasts
Petition 870190061136, of 7/1/2019, p. 21/139
15/107 video streaming via the Internet, such as dynamic adaptive streaming over HTTP (DASH), digital video that is encoded to a data storage medium, decoding of digital video stored on a data storage medium, or other applications . In some instances, system 10 can be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcast, and / or video telephony.
[0044] Computer readable medium 16 may include transient media, such as a wireless broadcast or wired transmission, or storage media (i.e., non-transient storage media), such as a hard disk, flash drive, compact discs, digital video discs, Blu-ray discs, server or other computer-readable media. In some examples, a network server (not shown) can receive encoded video data from the source device 12 and provide the encoded video data to the destination device 14, for example, via network transmission. Likewise, a computing device of a media production facility, such as a disc embossing unit, can receive encoded video data from the source device 12 and produce a disc containing the encoded video data. Therefore, the computer-readable medium 16 can be understood to include one or more computer-readable media in various ways, in various examples.
[0045] The input interface 26 of the target device 14 receives the information from a readable medium
Petition 870190061136, of 7/1/2019, p. 22/139
16/107 per computer 16. Computer readable media information 16 can include syntax information defined by video encoder 22 of a video encoder 22, which is also used by video decoder 30, which includes syntax elements that describe the characteristics and / or processing of blocks and other coded units, for example, image groups (GOPs). Storage media 28 can store the encoded video data received by the input interface 26. The display device 32 shows the decoded video data to a user. The display device 32 can comprise any of a variety of display devices, such as a liquid crystal display (LCD), a plasma screen, an organic light-emitting diode (OLED) display, or other type of device display.
[0046] The video encoder unit 22 and video decoder 30 each can be implemented as any of a variety of suitable encoder or decoder circuits, such as one or more microprocessors, digital signal processors ( DSPs), application specific integrated circuit (ASIC), field programmable port arrangements (FPGA), discrete logic, software, hardware, firmware, or any combination thereof. When the techniques are partially implemented in software, a device can store instructions for the software in a suitable, non-transitory, computer-readable medium and can execute instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of the
Petition 870190061136, of 7/1/2019, p. 23/139
17/107 video encoder 22 and video decoder 30 can be included in one or more encoders or decoders, each of which can be integrated as part of a combined encoder / decoder (CODEC) in a respective device.
[0047] In some examples, video encoder 22 and video decoder 30 may operate according to a video encoding standard. Exemplary video encoding standards include, but are not limited to, ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263 , ISO / IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO / IEC MPEG-4 AVC), including its Scalable Video Encoding (SVC) and Multivista Video Encoding (MVC) extensions. The high efficiency video coding standard (HEVC) or ITU-T H.265, including its scope and encoding extensions for screen content, 3D video coding (3D-HEVC) and multiview extensions (MV-HEVC) and scalable extension (SHVC), was developed by the Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T Video Coding Experts Group (VCEG) and ISO / IEC Motion Picture Experts Group (MPEG). The video encoder 22 and the video decoder 30 can also be configured to operate according to a future video encoding standard, such as the video encoding standard to be developed by the Joint Video Exploration Team (JVET) group. JEM software is based on the HEVC (HM) model software and is the reference software for JVET.
[0048] In HEVC and other video encoding specifications, a video sequence includes
Petition 870190061136, of 7/1/2019, p. 24/139
18/107 typically a series of images. Images can also be referred to as frames. A table can include three sample matrices, denoted S _L , S _cb , and S _cr . S _L is a two-dimensional matrix (ie, a block) of luma samples. S _cb is a two-dimensional matrix of Cb chrominance samples. S _Cr is a two-dimensional matrix of Cr chrominance samples. chrominance samples can also be referred to here as a chroma sample. In other cases, an image may be monochrome and may include only a set of luma samples.
[0049] In addition, in HEVC and other video encoding specifications, to generate an encoded representation of an image, video encoder 22 can generate a set of encoding tree units (CTU). Each of the CTUs may comprise a luma sample coding tree block, two corresponding chrominance sample coding tree blocks, and syntax structures used to encode the coding tree block samples. In monochrome images or images that have three distinct color planes, a CTU can comprise a single block in the coding tree and syntax structure used to code the samples in the block in the coding tree. An encoding tree block can be an NxN sample block. The CTU can also be referred to as a tree block or a larger coding unit (LCU). HEVC CTUs can be broadly analogous to macroblocks of other standards, such as H.264 / AVC. However, a CTU is not necessarily limited to a particular size, and may include one or more encoding units (CUs). An
Petition 870190061136, of 7/1/2019, p. 25/139
19/107 slice can include an integer number of CTUs sorted consecutively in a raster scan order.
[0050] If operating in accordance with HEVC, to generate an encoded CTU, video encoder 22 can recursively perform quadtree partitioning over the encoding tree blocks of a CTU to divide the encoding tree blocks into encoding blocks , hence the name of encoding tree units. A coding block is an NxN block of samples. A CU can comprise a luma sample coding block and two corresponding chroma sample coding blocks of an image that has a luma sample matrix, a Cb sample matrix, and a Cr sample matrix, and syntax structures used to code the samples of the coding blocks. In monochrome images or images that have three distinct color planes, a CU can comprise a single coding block and syntax structures used to encode the samples in the coding block.
[0051] Syntax data within a bit stream can also define a size for the CTU. A slice includes a number of consecutive CTUs, in order of encoding. A video or image frame can be divided into one or more slices. As mentioned above, each tree block can be divided into coding units (CUs) according to a quadtree. In general, a quadtree data structure includes one node per CU, with a root node corresponding to the tree block. If a CU is divided into four sub-CUs, the node corresponding to the CU includes four leaf nodes, each of which corresponds to one of the sub-CUs.
Petition 870190061136, of 7/1/2019, p. 26/139
20/107 [0052] Each node in the quadtree data structure can provide syntax data for the corresponding CU. For example, a node in the quadtree can include a division flag, which indicates whether the CU corresponding to the node is divided into subCUs. Syntax elements for a CU can be defined recursively, and may depend on whether the CU is divided into sub-CUs. If a CU is not further divided, it is referred to as a CU sheet. If a CU block is further divided, it can generally be referred to as non-CU-sheet. In some examples of this disclosure, four subCUs of a CU-sheet can be referred to as a CU-sheet even if there is no explicit division of the original CU-sheet. For example, if a 16x16 CU is not further divided, the four 8x8 sub-CUs can also be referred to as leaf-CUs although the 16x16 CU is never divided.
[0053] A CU has an effect similar to that of a H.264 macroblock, except that a CU does not have a size distinction. For example, a tree block can be divided into four child nodes (also referred to as sub-CUs), and each child node can in turn be a parent node and be divided into four more child nodes. A definite, non-split child node, referred to as a quadtree leaf node, comprises a coding node, also referred to as a CU-leaf. Syntax data associated with an encoded bit stream can define a maximum number of times that a tree block can be divided, referred to as a maximum CU depth, and can also define a minimum encoding node size. The depth of a tree block structure can indicate the number of times
Petition 870190061136, of 7/1/2019, p. 27/139
21/107 that a block was divided. For example, depth 0 can be related to a block before any partitioning, depth 1 can be related to blocks created from a similar parent block division, depth 2 can be related to blocks created from a division of a block at depth 1, and so on. A continuous stream of data can also define a smaller encoding unit (SCU). This disclosure uses the term block to refer to any of a CU, PU, or TU, in the context of HEVC, or similar data structures in the context of other standards (for example, JEM coding units, macroblocks and sub- blocks in H.264 / AVC, etc.).
[0054] A CU includes a coding node, as well as prediction units (UP) and transform units (UST) associated with the coding node. A CU size corresponds to a coding node size and can, in some examples, be a square shape. In the HEVC example, the CU size can vary from 8x8 pixels to the size of the tree block with a maximum of 64x64 pixels or greater. Each CU can contain one or more PUs and one or more TUs. Syntax data associated with a CU can describe, for example, partitioning the CU into one or more PUs. Partitioning modes can differ between whether CU is ignored or encoded in direct mode, encoded in intraprediction mode, or encoded in interpredition mode. PUs can be divided to be non-square in shape. Syntax data associated with a CU can also describe, for example, partitioning the CU into one or more
Petition 870190061136, of 7/1/2019, p. 28/139
10/22
more TUs according to a quadtree. A TU can to be square or not square (e.g. rectangular) in form. [0055] The HEVC standard allows transformations in
according to TUs. TUs can be different for different CUs. TUs are typically sized based on the size of PUs within a given CU defined for a partitioned LCU, although this may not always be the case. TUs are typically the same size or smaller than PUs. In some examples, residual samples corresponding to a CU can be subdivided into smaller units that use a quadtree structure, sometimes called a residual quadtree (RQT). RQT leaf nodes can be referred to as TUs. The pixel difference values associated with the TUs can be transformed to produce transform coefficients, which can be quantized.
[0056] A CU sheet can include one or more PUs. In general, a PU represents a spatial area corresponding to all or a portion of the corresponding CU, and can include data to obtain a reference sample for the PU. In addition, a PU includes data related to prediction. For example, when the PU is encoded in intra mode, the data for the PU can be included in an RQT, which can include data that describes an intraprediction mode for a PU corresponding to the TU. As another example, when the PU is coded in inter mode, the PU can include data that defines one or more motion vectors for the PU. The data that defines the motion vector for a PU can describe, for example, a component
Petition 870190061136, of 7/1/2019, p. 29/139
23/107 horizontal of the motion vector, a vertical component of the motion vector, a resolution for the motion vector (for example, precision of a quarter pixel or precision of an eighth pixel), a reference image for which the motion vector points and / or a list of reference images (for example, List 0, List 1 or List C) to the motion vector.
[0057] A CU sheet having one or more PUs can also include one or more TUs. TUs can be specified using an RQT (also referred to as a quadtree TU structure), as discussed above. For example, a split flag can indicate whether a CU-sheet is divided into four transform units. In some examples, each transform unit can be further divided into more sub-TUs. When a TU is not further divided, it can be referred to as a TU sheet. Generally, for intracoding, all TU sheets belonging to a CU sheet contain residual data produced from the same intraprediction mode. That is, the same intraprediction mode is generally applied to calculate the predicted values that will be transformed in all the TUs of a CU sheet. For intracoding, video encoder 22 can calculate a residual value for each TU sheet using the
intraprediction, as an difference between portion from CU corresponding to TU and the original block. An YOU it is not necessarily limited the size of a PU. Thus, TUs can be bigger or smaller than an PU. For
int racodification, a PU can be colocalized with a corresponding TU sheet for the same CU. In some
Petition 870190061136, of 7/1/2019, p. 30/139
24/107 examples, the maximum dimension of a TU sheet can correspond to the size of the corresponding CU sheet.
[0058] In addition, leaf TUs-CUs can also be associated with respective RQT structures. That is, a CU sheet can include a quadtree indicating how the CU sheet is divided into TUs. The root node of a quadtree of TU generally corresponds to a CU-leaf, whereas the root node of a quadtree of CU usually corresponds to a tree-block (or LCU).
[0059] As discussed above, video encoder 22 can partition a CU's encoding block into one or more prediction blocks. A prediction block is a rectangular sample block (that is, square or non-square) on which the same prediction is applied. A CU of a CU can comprise a luma sample prediction block, two corresponding chroma sample prediction blocks, and syntax structures used to predict the prediction blocks. In monochrome images or images that have three distinct color planes, a PU can comprise a single prediction block and syntax structures used to predict the prediction block. Video encoder 22 can generate predictive blocks (e.g., predictive luma blocks, Cb and Cr) for prediction blocks (e.g., luma prediction blocks, Cb and Cr) from each CU of the CU.
[0060] Video encoder 22 can use intraprediction or interpredition to generate the predictive blocks for a PU. If the video encoder 22 uses intraprediction to generate the predictive blocks of a PU, the video encoder 22 can generate the blocks
Petition 870190061136, of 7/1/2019, p. 31/139
25/107 predictive of PU based on decoded image samples, which includes PU.
[0061] After a video encoder 22 generates predictive blocks (for example, predictive luma blocks, Cb and Cr) for one or more PUs in a CU, video encoder 22 can generate one or more residual blocks for the CU . For example, video encoder 22 can generate a residual luminance block for the CU. Each of the samples in the CU residual luminance block indicates a difference between a luma sample in one of CU's predictive luma blocks and a corresponding sample in the CU's original luma encoding block. In addition, video encoder 22 can generate a residual block Cb for the CU. Each of the samples in the residual block Cb of a CU can indicate a difference between a sample of Cb in one of the predictive Cb blocks of CU and a corresponding sample in the original Cb coding block of CU. The video encoder 22 can also generate a residual block Cr for the CU. Each sample in the CU residual Cr block can indicate a difference between a Cr sample in one of CU's predictive Cr blocks and a corresponding sample in the original CU Cr coding block.
[0062] In addition, as discussed above, video encoder 22 can use quadtree partitioning to decompose the residual blocks (for example, residual luma blocks, Cb and Cr) of a CU into one or more transform blocks (for example , luma transform blocks, Cb and Cr). A transform block is a rectangular sample block (for example, square or non-square) to which the same transform is applied. A unit of
Petition 870190061136, of 7/1/2019, p. 32/139
26/107 A CU's transform (TU) may comprise a luma sample transform block, two corresponding chroma sample transform blocks, and syntax structures used to transform the transform block samples. Thus, each CU of a CU can have a luma transform block, a Cb transform block and a Cr transform block. The TU luma transform block can be a sub-block of the CU residual luminance block. The transform block Cb can be a sub-block of the residual block Cb of CU. The transform block Cr can be a sub-block of the residual block Cr of CU. In monochrome images or images that have three distinct color planes, a TU can comprise a single transform block and syntax structures used to transform the samples in the transform block.
[0063] Video encoder 22 can apply one or more transforms to a transform block of a TU to generate a block of coefficients for the TU. For example, video encoder 22 can apply one or more transforms to a TU's luma transform block to generate a luminance coefficient block for the TU. A block of coefficients can be a two-dimensional array of transform coefficients. A transform coefficient can be a scalar quantity. The video encoder 22 can apply one or more transforms to a Cb transform block of a TU to generate a block of Cb coefficients for the TU. The video encoder 22 can apply one or more transforms to a TU Cr transform block to generate a Cr coefficient block for the TU.
Petition 870190061136, of 7/1/2019, p. 33/139
27/107 [0064] In some examples, video encoder 22 skips applying transforms to the transform block. In such examples, video encoder 22 can treat residual sample values in the same way as the transform coefficients. Thus, in examples where the video encoder 22 skips application of the transforms, the following discussion of transform coefficients and blocks of coefficients may be applicable for transforming residual sample blocks.
[0065] After generating a coefficient block (for example, a luminance coefficient block, a Cb coefficient block or a Cr coefficient block), the video encoder 22 can quantize the coefficient block to possibly reduce the amount of data used to represent the block of coefficients, potentially providing additional compression. Quantization generally refers to a process in which a range of values is compressed to a single value. For example, quantization can be done by dividing a value of a constant, and then rounding to the nearest integer. To quantize the coefficient block, a video encoder 22 can quantize the transform coefficients of the coefficient block. After a video encoder 22 quantizes a block of coefficients, a video encoder 22 can entropy encode syntax elements indicating the quantized transform coefficients. For example, video encoder 22 can perform Adaptive Context Binary Arithmetic Coding (CABAC) or other entropy coding techniques on syntax elements
Petition 870190061136, of 7/1/2019, p. 34/139
28/107 indicating quantized transform coefficients.
[0066] Video encoder 22 can output a bit stream that includes a bit stream, which forms a representation of encoded images and associated data. Thus, the bit stream comprises an encoded representation of the video data. The bit stream may comprise a sequence of network abstraction layer (NAL) units. An NAL unit is a syntax structure that contains an indication of the type of data in the NAL unit and bytes that contain the data in the form of a raw byte sequence (RBSP) payload interspersed as necessary with the emulation prevention bits . Each of the NAL units can include an NAL unit header and can encapsulate an RBSP. The NAL unit header can include a syntax element indicating an NAL unit type code. The NAL unit type code specified by the NAL unit header of an NAL unit indicates the type of the NAL unit. An RBSP can be a syntax structure containing an integer number of bytes that is encapsulated within an NAL unit. In some cases, an RBSP includes zero bits.
[0067] The video decoder 30 can receive a bit stream generated by the video encoder 22. Video decoder 30 can decode the bit stream to reconstruct images of the video data. As part of decoding the bit stream, the video decoder 30 can analyze the bit stream to obtain bit stream syntax elements. Video decoder 30 can reconstruct images from video data based, at least in part, on the syntax elements obtained from
Petition 870190061136, of 7/1/2019, p. 35/139
29/107 of the bit stream. The process for reconstructing the video data can generally be reciprocal to the process performed by the video encoder 22. For example, a video decoder 30 can use the PU motion vectors to determine predictive blocks for the PU of a current CU. In addition, the video decoder 30 can invert TU quantization coefficient blocks of the current CU. The video decoder 30 can perform inverse transforms on the coefficient blocks to reconstruct transform blocks for all the TUs of the current CU. The video decoder 30 can reconstruct the encoding blocks of the current CU, adding the samples of the predictive blocks to the PUs of the current CU for transform block samples of all the TUs of the corresponding current CU. By reconstructing the encoding blocks for each CU of an image, a video decoder 30 can reconstruct the image.
[0068] Common concepts and certain aspects of HEVC design are described below, with a focus on block partition techniques. In HEVC, the largest encoding unit in a slice is called a CTB. A CTB is divided according to a quadtree structure, the nodes of which are coding units. The plurality of nodes in a quadtree structure includes leaf nodes and non-leaf nodes. Leaf nodes do not have child nodes in the tree structure (that is, leaf nodes are not further divided). Non-leaf nodes include a root node of the tree structure. The root node corresponds to an initial video block of the video data (for example, a CTB). For each respective non-root node of the plurality of nodes, the
Petition 870190061136, of 7/1/2019, p. 36/139
30/107 respective non-root node corresponds to a video block that is a sub-block of a video block corresponding to a parent node in the tree structure of the respective non-root node. Each respective non-leaf node of the plurality of non-leaf nodes has one or more child nodes in the tree structure.
[0069] The size of a CTB range from 16x16 to 64x64 in the main HEVC profile (although technically CTB sizes 8x8 can be supported). A CTB can be recursively divided into CUs in a quadtree fashion, as described in W. J. Han et al, Improved Video Compression Efficiency Through Flexible Unit Representation and Corresponding Extension of Coding Tools, IEEE Transaction on Circuits and Systems for Video Technology, vol. 20, no. 12, pp. 1709-1720, Dec. 2010, and shown in figure 2. As shown in figure 2, each level of partitioning is a quadtree division into four sub-blocks. The black block is an example of a node sheet (that is, a block that is not further divided).
[0070] In some examples, a CU can be the same size as a CTB, although a CU can be as small as 8x8. Each CU is encoded with an encoding mode, which can be, for example, an intracoding mode or an intercoding mode. Other encoding modes are also possible, including encoding modes for the content of the screen (for example, intra block copy modes, palette based encoding modes, etc.). When a CU is intercodified (that is, the inter mode is applied), the CU can be further divided into prediction units (PUs). For example, a CU can be divided into 2 to 4 or PUs. In another example, all
Petition 870190061136, of 7/1/2019, p. 37/139
10/31
CU is treated as a single PU when additional partitioning is not applied. In HEVC examples, when two PUs are present in a CU, they can be the size of half a rectangle or two rectangles with h or ³ A the size of CU.
[0071] In HEVC, there are eight partition modes for a CU encoded with the interpretation mode, ie PART_2Nx2N, PART_2NxN, PART_Nx2N, PART_NxN, PART_2NxnU, PART_2NxnD, PART_nLx2N and PART_nRx2N, as shown in figure 3. 3, a CU encoded with the PART_2Nx2N partition mode is not further divided. That is, the entire CU is treated as a single PU (PUO). A CU encoded with the PART_2NxN partition mode is symmetrically horizontally divided into two PUs (PUO and PU1). A CU encoded with the PART_Nx2N partition mode is symmetrically vertically divided into two PUs. A CU encoded with the PART_NxN partition mode is divided into four PUs of symmetrically equal sizes (PUO, PU1, PU2, PU3).
[0072] A CU coded with PART_2NxnU partition mode is divided into a horizontally asymmetrically PUO (PU upper) having oh size CU and UP1 (PU inferred) having the size of ³ CU. A CU encoded with the PART_2NxnD partition mode is asymmetrically horizontally divided into a PUO (upper PU), having ³ A the size of CU and a PU1 (lower PU) having M the size of CU. A CU encoded with the PART_nLx2N partition mode is asymmetrically vertically divided into a PUO (left PU) with h the size of CU and a PU1 (right PU) having ³ A the size of CU. A CU
Petition 870190061136, of 7/1/2019, p. 38/139
32/107 encoded with the PART_nRx2N partition mode is asymmetrically vertically divided into a PUO (left PU) having ³ A the size of CU and a PU1 (right PU) having M the size of CU.
[0073] When a CU is intercodified, a set of movement information (for example, the movement vector, in the direction of prediction, and reference image) is present for each PU. In addition, each PU is coded with a unique interpretation mode to derive the set of movement information. However, it must be understood that, even if two PUs are uniquely encoded, they may still have the same movement information in some circumstances.
[0074] In Block partitioning structure for next generation video coding, International Telecommunication Union, COM16-C966, Sep. 2015 (hereinafter proposed VCEG COMÍ6-C966), binary quadtree partitioning techniques (QTBT) have been proposed for coding standard of future video beyond HEVC. Simulations showed that the proposed QTBT structure is more efficient than the HEVC quadtree structure used.
[0075] In the proposed QTBT structure of the VCEG COM16-C966 proposal, a CTB is first partitioned using quadtree partitioning techniques, in which the quadtree division of a node can be iterated until the node reaches the minimum allowed quadtree leaf node size. The minimum size allowed for quadtree sheets can be assigned to the video decoder by the value of the MinQTSize syntax element. If the size of the quadtree leaf node is not greater than the maximum allowed size of the tree root node
Petition 870190061136, of 7/1/2019, p. 39/139
33/107 binary (for example, as indicated by a MaxBTSize syntax element), the quadtree leaf node can be further partitioned using binary tree partitioning. A node's binary tree partitioning can be repeated until the node reaches the minimum allowed binary tree leaf node size (for example, as indicated by a MinBTSize syntax element) or the maximum allowed binary tree depth (for example , as indicated by a MaxBTDepth syntax element). Proposal VCEG COM16-C966 uses the term CU to refer to binary tree leaf nodes. In the VCEG COM16-C966 proposal, CUs are used for prediction (for example, intraprediction, interpredition, etc.) and transformed without any additional partitioning. In general, according to QTBT techniques, there are two types of division for binary tree division: horizontal symmetric division and vertical symmetric division. In each case, a block is divided by dividing the block in half, horizontally or vertically.
[0076] In an example of the QTBT partitioning structure, the CTU size is set to 128x128 (for example, a 128x128 luma block and two corresponding 64x64 chroma blocks), MinQTSize is set to 16x16, MaxBTSize is set to 64x64, MinBTSize (for width and height) is set to 4, and MaxBTDepth is set to 4. Quadtree partitioning is first applied to CTU to generate quadtree leaf nodes. Quadtree leaf nodes can have a size of 16x16 (that is, the MinQTSize is 16x16) to 128x128 (that is, the size of the CTU). According to an example of QTBT partitioning, if the quadtree leaf node is 128x128, the quadtree leaf node cannot
Petition 870190061136, of 7/1/2019, p. 40/139
34/107 be additionally divided by the binary tree since the size of the quadtree leaf node exceeds MaxBTSize (ie, 64x64). Otherwise, the quadtree leaf node is additionally partitioned by the binary tree. Therefore, the quadtree leaf node is also the root node for the binary tree and has the binary tree depth as 0. The binary tree depth that reaches MaxBTDepth (for example, 4) implies that there is no more division. The binary tree node that has a width equal to MinBTSize (for example, 4) implies that there is no longer any horizontal division. Likewise, the binary tree node, with a height equal to MinBTSize, implies that there is no longer any vertical division. The leaf nodes of the binary tree (CUs) are subsequently transformed (for example, by performing a prediction process and a transform process) without any additional partitioning.
[0077] Figure 4A illustrates an example of a block 50 (for example, a CTB) partitioned using QTBT partitioning techniques. As shown in figure 4A, when using the QTBT partition techniques, each of the resulting blocks is divided symmetrically through the center of each block. Figure 4B illustrates the tree structure corresponding to the block partitioning of figure 4B. The solid lines in figure 4B indicate quadtree division and the dotted lines indicate binary tree division. In an example, in each division node (that is, not leaf) of the binary tree, a syntax element (for example, a flag) is flagged to indicate the type of partitioning performed (for example, horizontal or vertical), where 0 indicates horizontal division and 1 indicates
Petition 870190061136, of 7/1/2019, p. 41/139
35/107 vertical division. For quadtree division, there is no need to indicate the type of division, just as the quadtree division always divides a block horizontally and vertically into 4 sub-blocks of equal size.
[0078] As shown in figure 4B, at node 70, block 50 is divided into four blocks 51, 52, 53, and 54, shown in figure 4A, using QT partitioning. Block 54 is not further divided, and is therefore a leaf node. At node 72, block 51 is further divided into two blocks using BT partitioning. As shown in figure 4B, node 72 is marked with a 1, indicating vertical division. As such, the split at node 72 results in block 57 and the block including both blocks 55 and 56. Blocks 55 and 56 are created by another vertical division at node 74. At node 76, block 52 is further divided into two blocks 58 and 59 using BT partitioning. As shown in figure 4B, node 7 6 is marked with a 1, which indicates the horizontal division.
[0079] At node 78, block 53 is divided into 4 blocks of equal size, using QT partitioning. Blocks 63 and 66 are created from this QT partitioning and are no longer divided. At node 80, the upper left block is first divided using vertical binary tree division which results in block 60 and a right vertical block. The right vertical block is then divided using horizontal binary tree division in blocks 61 and 62. The lower right corner block created from the quadtree division in node 78 is divided in node 84 using horizontal binary tree division in blocks 64 and 65.
Petition 870190061136, of 7/1/2019, p. 42/139
36/107 [0080] Although the QTBT framework described above shows better coding performance than the quadtree framework used in HEVC, the QTBT framework lacks flexibility. For example, in the QTBT structure described above, a quadtree node can be further divided with a binary tree, but a binary tree node cannot be further divided with quadtree. In another example, both quadtree and binary tree can only achieve even division (that is, division to the center of the block), which is not efficient when an object is in the center of a block to be divided. Therefore, QTBT encoding performance may be lacking for future video encoding standards.
[0081] To resolve the problems mentioned above, US Patent Publication No. 20170208336, filed on January 12, 2017 and US Patent Publication No. 20170272782, filed on March 20, 2017, both of which are incorporated herein by reference, describe several examples of a multi-type tree partition structure. According to an MTT partitioning structure, a tree node can be further divided with several types of trees, such as binary tree, symmetrical center-lateral triple tree, and quadtree. Simulations showed that the multi-type tree structure was much more efficient than the quadtree binary tree structure.
[0082] To better achieve more flexible partitioning for a CTU, an MTT-based CU structure is proposed to replace CU structures based on QT, BT, and / or QTBT. The MTT partitioning structure of this
Petition 870190061136, of 7/1/2019, p. 43/139
37/107 disclosure is still a recursive tree structure. However, several different partition structures (for example, three or more) are used. For example, according to the MTT techniques of this disclosure, three or more different partition structures can be used at each depth of a tree structure. In this context, the depth of a node in a tree structure can refer to the length of the path (for example, the number of divisions) from the root node of the tree structure.
[0083] In an example according to the techniques of this disclosure, the video encoder 22 and / or video decoder 30 can be configured to receive a frame of video data, and the partition of the image of video data in a plurality blocks with three or more different partition structures, and reconstruct / encode the plurality of blocks of the video data image. In one example, partitioning the video data image comprises partitioning the video data image into a plurality of blocks using three or more different partition structures, where at least three of the three or more different partition structures can be used in each depth of a tree structure that represents how the video data image is partitioned. In one example, the three or more different partition structures include a triple tree partitioning structure, and the video encoder 22 and / or video decoder 30 can be configured to split one of the plurality of video data blocks, using a triple tree partition type of the partition structure of
Petition 870190061136, of 7/1/2019, p. 44/139
38/107 triple tree, in which the triple tree partition structure divides that of the plurality of blocks into three subblocks without dividing that of the plurality of blocks up to the center. In a further example of the disclosure, the three or more different partition structures further include a quadtree partition structure and a binary tree partition structure.
[0084] Thus, in one example, video encoder 22 can generate an encoded representation of an initial video block (e.g., an encoding tree block or CTU) of video data. As part of generating the encoded representation of the initial video block, a video encoder 22 determines a tree structure comprising a plurality of nodes. For example, video encoder 22 can partition a tree block using the MTT partitioning structure of this disclosure.
[0085] The plurality of nodes in the MTT partitioning structure includes a plurality of leaf nodes and a plurality of non-leaf nodes. Leaf nodes have no child nodes in the tree structure. Non-leaf nodes include a root node of the tree structure. The root node corresponds to the initial video block. For each respective non-root node of the plurality of nodes, the respective non-root node corresponds to a video block (for example, an encoding block), which is a sub-block of a video block corresponding to a parent node in the tree structure of the respective non-root node. Each respective non-leaf node of the plurality of non-leaf nodes has one or more child nodes in the tree structure. In some examples, a non-leaf node in
Petition 870190061136, of 7/1/2019, p. 45/139
39/107 an image boundary can have only one child node due to forced division and one of the child nodes corresponds to a block outside the image boundary.
[0086] According to the techniques of this disclosure, for each respective non-leaf node of the tree structure, at each level of the depth of the tree structure, there is a plurality of authorized division patterns (for example, partitioning structure) for the respective non-leaf node. For example, there may be three or more partition structures allowed for each depth of the tree structure. The video encoder 22 can be configured to partition a video block corresponding to the respective non-leaf node into video blocks that correspond to the child nodes of the respective non-leaf node according to one of the plurality of permitted partition structure. Each respective permitted partition structure of the plurality of permitted partition structures may correspond to a different way of partitioning the video block corresponding to the respective non-leaf node into video blocks corresponding to the child nodes of the respective non-leaf node. In addition, in the present example, video encoder 22 may include the encoded representation of the initial video block in a bit stream that comprises an encoded representation of the video data.
[0087] In a similar example, the video decoder 30 can determine a tree structure comprising a plurality of nodes. As in the previous example, the plurality of nodes includes a plurality of leaf nodes and a plurality of non-leaf nodes. The knots
Petition 870190061136, of 7/1/2019, p. 46/139
40/107 leaves do not have child nodes in the tree structure. Non-leaf nodes include a root node of the tree structure. The root node corresponds to an initial video block of the video data. For each respective non-root node of the plurality of nodes, the respective non-root node corresponds to a video block which is a sub-block of a video block corresponding to a parent node in the tree structure of the respective non-root node . Each respective non-leaf node of the plurality of non-leaf nodes has one or more child nodes in the tree structure. For each respective non-leaf node of the tree structure, at each level of the depth of the tree structure, there are a plurality of division patterns allowed for the respective non-leaf node and the video block corresponding to the respective non-leaf node is divided into video blocks corresponding to the child nodes of the respective non-leaf node according to one of the plurality of allowed division patterns. Each respective allowed division pattern out of the plurality of allowed division patterns corresponds to a different way of partitioning the video block corresponding to the respective non-leaf node into video blocks that correspond to the child nodes of the respective non-leaf node. In addition, in this example, for each (or at least one) respective leaf node of the tree structure, the video decoder 30 reconstructs the video block corresponding to the respective leaf node.
[0088] In some of such examples, for each respective non-leaf node of the tree structure different from the root node, the plurality of authorized division patterns (for example, partition structures) for the respective node
Petition 870190061136, of 7/1/2019, p. 47/139
41/107 non-leaf is independent of the partition structure according to which a video block corresponding to a parent node of the respective non-leaf node is divided into video blocks corresponding to child nodes of the parent node of the respective non-leaf node .
[0089] In other examples of the disclosure, at each depth of the tree structure, the video encoder 22 can be configured to additionally split sub-trees using a specific type of partition between one of three more partitioning structures. For example, video encoder 22 can be configured to determine a particular partition type from QT, BT, triple tree (TT) and other partitioning structures. In one example, the QT partitioning structure can include rectangular and square quadtree partitioning types. Video encoder 22 can partition a square block using square quadtree partitioning by dividing the block, both horizontally and vertically, into four square blocks of equal size. Likewise, video encoder 22 can partition a rectangular block (for example, non-square) using rectangular quadtree partition by dividing the rectangular block, up to the center horizontally and vertically, into four rectangular blocks of equal size.
[0090] The BT partitioning structure can include the types of horizontal symmetric binary tree partition, vertical symmetric binary tree, horizontal non-symmetric binary tree, and vertical non-symmetric binary tree. For the tree partition type
Petition 870190061136, of 7/1/2019, p. 48/139
42/107 horizontal symmetric binary, a video encoder 22 can be configured to divide a block, up to the center of the block horizontally, into two symmetric blocks of the same size. For the vertical symmetric binary tree partition type, video encoder 22 can be configured to split a block, up to the center of the block vertically, into two symmetric blocks of the same size. For the type of horizontal non-symmetric binary tree partition, a video encoder 22 can be configured to divide a block horizontally into two blocks of different sizes. For example, one block can be h the size of the parent block and the other block can be ³ A the size of the parent blocks, as in the partition type PART_2NxnU or PART_2NxnD in figure 3. For the type of vertical non-symmetric binary tree partition , the video encoder 22 can be configured to divide a block vertically into two blocks of different sizes. For example, one block can be h the size of the parent block and the other block can be ³ A the size of the parent blocks, as in the partition type PART_nLx2N or PART_nRx2N in figure 3.
[0091] In other examples, an asymmetric binary tree partition type can divide a similar block into fractions of different sizes. For example, one subblock can be 3/8 of the parent block and the other subblock can be 5/8 of the parent block. Naturally, such a partition can be vertical or horizontal.
[0092] The TT partitioning structure differs from that of the QT or BT structures, in which the TT partitioning structure does not divide a block to the center. The central region of the block remains together in the same sub-block.
Petition 870190061136, of 7/1/2019, p. 49/139
43/107
Unlike QT, which produces four blocks, or binary tree, which produces two blocks, dividing according to a partitioning structure TT produces three blocks. Exemplary partition types according to the TT partitioning structure include the horizontal symmetric triple tree partition types, vertical symmetric triple tree, horizontal non-symmetric triple tree, and vertical non-symmetric triple tree. For the horizontal symmetric triple tree partition type, a video encoder 22 can be configured to divide a block horizontally into three blocks without dividing the blocks to the center. When divided according to the horizontal symmetric triple tree partitioning, the blocks above and below the central sub-block are mirrored, that is, they are the same size. If a block is directly divisible by three (for example, 12 samples high), the central block can be the same size as the upper and lower blocks. If a block is not directly divisible by three (for example, 8 samples high), the central block may be of a different size than the upper and lower blocks. For example, for a block of 8 samples high, the upper and lower blocks can be 3 samples high and the central block can be 2 samples high. In another example, for a block of 8 samples high, the upper and lower blocks can be 2 samples high, and the central block can be 4 samples high. Figure 5E shows an example of triple horizontal tree partitioning.
[0093] For the vertical symmetric triple tree partition type, video encoder 22 can be
Petition 870190061136, of 7/1/2019, p. 50/139
44/107 configured to divide a block vertically into three blocks without dividing the blocks to the center. When divided according to vertical symmetric triple tree partitioning, the left and right blocks to the central sub-block are mirrored, that is, they are the same size. If a block is directly divisible by three (for example, 12 samples wide), the central block can be the same size as the left and right blocks. If a block is not directly divisible by three (for example, 8 samples wide), the central block can be a different size than the left and right blocks. For example, for a block of 8 samples wide, the left and right blocks can be 3 samples wide and the center block can be 2 samples wide. In another example, for a block of 8 samples wide, the left and right blocks can be 2 samples wide, and the central block can be 4 samples wide. Figure 5D shows an example of vertical triple tree partitioning.
[00 94] For the horizontal symmetric triple non-tree partition type, the video encoder 22 can be configured to divide a block horizontally into three blocks that are not symmetrically mirrored. In some examples, the horizontal non-symmetric triple tree partition type may split a block to the center, and in other examples, the horizontal non-symmetric triple tree partition type may not divide a block to the center. For the vertical non-symmetric triple tree partition type, video encoder 22 can be configured to split a block vertically into three
Petition 870190061136, of 7/1/2019, p. 51/139
45/107 blocks that are not symmetrically mirrored. In some examples, the vertical non-symmetric triple tree partition type may split a block to the center, and in other examples, the vertical non-symmetric triple tree partition type may not divide a block to the center.
[0095] In examples where a block (for example, a subtree node) is divided into a non-symmetric triple tree partition type, video encoder 22 and / or video decoder 30 may apply a restriction of such that two of the three partitions are the same size. Such a restriction may correspond to a limitation that the video encoder 22 must comply with when encoding video data. In addition, in some examples, video encoder 22 and video decoder 30 may apply a constraint to which the sum of the area of two partitions is equal to the area of the remaining partition when dividing according to a triple tree partition type not symmetrical. For example, video encoder 22 can generate or video encoder 30 can receive an encoded representation of the initial video block that conforms to a constraint specifying that when a video block corresponding to a tree structure node is partitioned according to a non-symmetric triple tree pattern, the node has a first child node, a second child node and a third child node, the second child node corresponding to a video block between the video blocks corresponding to the first and third nodes children, the video blocks corresponding to the first and third child nodes are the same size and a sum of the sizes of the video blocks corresponding to the first and third child nodes is equal to a size of the
Petition 870190061136, of 7/1/2019, p. 52/139
46/107 video block corresponding to the second child node.
[0096] In some examples of the disclosure, video encoder 22 can be configured to select from all of the partition types mentioned above for each of the QT, BT, and TT partition structures. In other examples, video encoder 22 can be configured to determine only one partition type from a subset of the partition types mentioned above. For example, a subset of the partition types discussed above (or other partition types) can be used for certain block sizes or for certain depths of a quadtree structure. The subset of supported partition types can be signaled in the bit stream for use by the video decoder 30 or it can be predefined in such a way that the video encoder 22 and the video decoder 30 can determine the subsets without any signaling.
[0097] In other examples, the number of supported partitioning types can be fixed for all depths in all CTUs. That is, video encoder 22 and video decoder 30 can be preconfigured to use the same number of partitioning types for any depth of a CTU. In other examples, the number of supported partitioning types may vary and may be dependent on depth, type of slice, or other previously encoded information. In one example, at depth 0 or depth 1 of the tree structure, only the QT partition structure is used. At depths greater than 1, each of the QT partition structures,
Petition 870190061136, of 7/1/2019, p. 53/139
47/107
BT, and TT can be used.
[0098] In some examples, video encoder 22 and / or video decoder 30 may apply preconfigured restrictions on supported partitioning types, in order to avoid duplicate partitioning for a given region of a video frame or region of a CTU. In one example, when a block is divided with the non-symmetric partition type, video encoder 22 and / or video decoder 30 can be configured to not further divide the largest sub-block that is divided from the block current. For example, when a square block is divided according to a non-symmetric partition type (for example, partition type PART_2NxnU in figure 3), the largest sub-block among all sub-blocks (for example, PU1 of partition PART_2NxnU in figure 3) is the observed leaf node and cannot be further divided. However, the smallest sub-block (for example, PU0 of partition type PART_2NxnU in figure 3) can be further divided.
[0099] As another example, where restrictions on supported partitioning types can be applied to avoid duplicate partitioning for a given region, when a block is divided with the non-symmetric partition type, the largest sub-block that is divided from the current block cannot be further divided in the same direction. For example, when a square block is divided non-symmetric partition type (for example, partition type PART_2NxnU in figure 3), video encoder 22 and / or video decoder 30 can be configured not to split the larger sub-block in between
Petition 870190061136, of 7/1/2019, p. 54/139
48/107 all sub-blocks (for example, PU1 of partition type PART_2NxnU in figure 3) in the horizontal direction. However, video encoder 22 and / or video decoder 30, in the present example, can split PU1 again in the vertical direction.
[0100] As another example, where restrictions on supported partitioning types can be applied to avoid the difficulty of additional division, video encoder 22 and / or video decoder 30 can be configured not to split a block, horizontally or vertically , when the width / height of a block is not a power of 2 (for example, when the height of the width is not 2, 4, 8, 16, etc.).
[0101] The examples above describe how video encoder 22 can be configured to perform MTT partitioning according to techniques in this disclosure. The video decoder 30 can then also apply the same MTT partitioning as was done by the video encoder 22. In some instances, how a video data frame was partitioned by a video encoder 22 can be determined by applying the same set of predefined rules in video decoder 30. However, in many situations, video encoder 22 can determine a particular partition structure and partition type to use based on distortion rate criteria for the particular data frame of video to be encoded. As such, in order for the video decoder 30 to determine the partition for a particular frame, the video encoder 22 may signal elements of syntax in the encoded bit stream that
Petition 870190061136, of 7/1/2019, p. 55/139
49/107 indicate how the frame, and frame CTUs, should be partitioned. The video decoder 30 can analyze such syntax elements and partition the frame and CTUs accordingly.
[0102] In an example of the disclosure, video encoder 22 can be configured to signal a particular subset of supported partition types as a high level syntax element, in a sequence parameter set (SPS), a set of image parameter (PPS), slice header, adaptive parameter set (APS), or any other high-level syntax parameter set. For example, the maximum number of partition types and which types are supported can be predefined, or flagged in the bit stream as a high-level syntax element, in the string parameter set (SPS), string parameter set (PPS) or any other high-level syntax parameter set. The video decoder 30 can be configured to receive and analyze such a syntax element to determine the particular subset of the partition types that are in use and / or the maximum number of partition structures (e.g., QT, BT, TT, etc. .) and types that are supported.
[0103] In some examples, at each depth, the video encoder 22 can be configured to signal an index that indicates the type of partitioning selected used at that depth of the tree structure. In addition, in some examples, video encoder 22 may adaptively signal such an index of the partition type in each CU, that is, the index may be different for different CUs. For example, the encoder
Petition 870190061136, of 7/1/2019, p. 56/139
50/107 of video 22 can define the partition type index based on one or more distortion rate calculations. In one example, the partitioning type flag (for example, the partition type index) can be ignored if a certain condition is met. For example, video encoder 22 can ignore the partition type signaling when there is only one supported partitioning type associated with a given depth. In this example, when approaching an image boundary, a region to be encoded may be smaller than a CTU. Consequently, in this example, CTUs can be forced to be divided to fit the image boundary. In one example, only a symmetric binary tree is used for division and no type of forced partitioning is signaled. In some examples, at a certain depth, the partition type can be derived based on the previously encoded information, such as the slice type, CTU depth, CU position.
[0104] In another example of the disclosure, for each CU (leaf node), the video encoder 22 can be additionally configured to signal a syntax element (for example, a one-bit transform_split flag) to indicate whether a transformed must be executed on the same size as the CU or not (that is, the flag indicates whether the TU is the same size as the CU or is additionally divided). In case the transform_split flag is flagged as true, the video encoder 22 can be configured to further divide the CU residual into multiple sub-blocks and the transform is made in each sub-block. The video decoder 30 can perform the process
Petition 870190061136, of 7/1/2019, p. 57/139
51/107 reciprocal.
[0105] In one example, when the transform_split flag is flagged as true, the following is done. If CU corresponds to a square block (that is, CU is square), then video encoder 22 divides the residual into four square sub-blocks using quadtree division, and the transform is performed on each square sub-block. If the CU corresponds to a non-square block, for example, MxN, then video encoder 22 divides the residual into two sub-blocks, and the size of the sub-block is 0.5MxN when M> N, and MxO. 5N when M <N. As another example, when the transform_split flag is flagged as true and CU corresponds to a non-square block, for example, MxN, (that is, CU is non-square), video encoder 22 can be configured to divide the residual into sub-blocks with size kxk, and the square transform kxk is used for each sub-block, where k is equal to the maximum factor of M and N. As another example, no transform_split flag is signaled when a CU 'we are a square block.
[0106] In some examples, no division flag is flagged and only a transform with a derived size is used for when there is residual in the CU after prediction. For example, a CU with size equal to MxN, the square transform KxK is used, where K is equal to the maximum factor M and N. Thus, in this example, for a CU size with 16x8, it can be transformed 8x8 applied to two 8x8 sub-blocks of residual CU data. A division flag is a syntax element indicating that a node in a tree structure has child nodes in the
Petition 870190061136, of 7/1/2019, p. 58/139
52/107 tree structure.
[0107] In some examples, for each CU, if the CU is not divided into a square quadtree, or a symmetric binary tree, video encoder 22 is configured to always set the size of the transform to equal the size of the partition (for example, example, CU size).
[0108] It should be understood that, for each of the examples described above with reference to a video encoder 22, a video decoder 30 can be configured to perform a reciprocal process. Regarding the signaling of syntax elements, the video decoder 30 can be configured to receive and analyze such a syntax and partition element and to decode the associated video data accordingly.
[0109] In a specific example of the disclosure, a video decoder can be configured to divide video blocks according to three different partition structures (QT, BT, and TT), with five different types of partitioning allowed at each depth. Partitioning types include quadtree partitioning (QT partition structure), horizontal binary tree partitioning (BT partition structure), vertical binary tree partitioning (BT partition structure), horizontal centrolateral triple tree partitioning (TT partition structure) , and vertical center-lateral triple tree partitioning (TT partition structure), as shown in figure 5A-5E.
[0110] The definitions of the five exemplary types of partitioning are as follows. Please note that square is considered to be a special case of
Petition 870190061136, of 7/1/2019, p. 59/139
53/107 rectangular.
• Quadtree partitioning: a block is additionally divided into four rectangular blocks of the same size. Figure 5A shows an example of quadtree partitioning.
• Vertical binary tree partitioning: a block is vertically divided into two rectangular blocks of the same size. Figure 5B is an example of vertical binary tree partitioning.
• Horizontal binary tree partitioning: a block is divided horizontally into two rectangular blocks of the same size. Figure 5C is an example of horizontal binary tree partitioning.
• Vertical centrolateral triple tree partitioning: a block is vertically divided into three rectangular blocks so that the two side blocks share the same size, while the size of the central block is the sum of the two side blocks. Figure 5D is an example of vertical center-lateral triple tree partitioning.
• Horizontal centrilateral triple tree partitioning: a block is divided horizontally into three rectangular blocks so that the two side blocks share the same size, while the size of the central block is the sum of the two side blocks. Figure 5E is an example of horizontal center-lateral triple tree partitioning.
[0111] For a block associated with a particular depth, video encoder 22 determines which type
Petition 870190061136, of 7/1/2019, p. 60/139
54/107 partition (not including any further divisions) is used and flags the partition type determined explicitly or implicitly (for example, the partition type can be derived from predetermined rules) for the video decoder 30. The video encoder 22 can determine the type of partition to use based on the distortion rate cost check for the block using different types of partition. In order to obtain the cost of distortion rate, video encoder 22 may need to check possible types of partitioning for the block, recursively.
[0112] Figure 6 is a conceptual diagram that illustrates an example of partitioning the encoding tree unit (CTU). In other words, figure 6 illustrates the partition of a CTB 80 corresponding to a CTU. In the example in figure 6, • In depth 0, CTB 80 (that is, the entire CTB) is divided into two blocks with horizontal binary tree partitioning (as indicated by line 82 dashed by individual points).
• In depth 1:
• The upper block is divided into three blocks with vertical central-lateral triple tree partitioning (as indicated by dashed lines 84 and 86 with small dashes).
• The lower block is divided into four blocks with quadtree partitioning (as indicated by lines 88 and 90 with small dashes separated by colons).
• In depth 2:
• The left side block of the upper block at depth 1 is divided into three blocks with
Petition 870190061136, of 7/1/2019, p. 61/139
55/107 partitioning of a horizontal central-lateral triple tree (as indicated by lines 92 and 94 with long lines separated by short lines).
• No further division for right and central blocks of the upper block at depth 1.
• No further division for the four blocks in the lower block at depth 1.
[0113] As you can see in the example in figure 6, three different partition structures are used (BT, QT, and TT) with four different partition types (horizontal binary tree partitioning, vertical center-lateral triple tree partitioning, quadtree partitioning, and horizontal center-lateral triple tree partitioning).
[0114] In another example, additional restrictions can be applied to the block at a certain depth or with a certain size. For example, if the height / width of a block is less than 16 pixels, the block cannot be divided with the vertical / horizontal center-lateral tree to avoid a block with a height / width less than 4 pixels.
[0115] In F. Le Leannec, T. Poirier, F. Urban, Asymmetric Coding Units in QTBT, JVET-D0064, Chengdu, Oct. 2016 (hereinafter JVET-D0064), asymmetric coding units have been proposed to be used together with QTBT. Four new binary tree split modes (for example, partition types) have been introduced in the QTBT structure to allow for new split configurations. The so-called asymmetric division modes have been proposed, in addition to the division modes already available in the
Petition 870190061136, of 7/1/2019, p. 62/139
56/107
QTBT, as shown in figure 7. As shown in figure 7, partition types HOR_UP, HOR_DOWN, VER_LEFT, and VER_RIGHT are examples of asymmetric split modes.
[0116] According to the asymmetric division modes added, an encoding unit with size S is divided into two sub-CUs with sizes of S / 4 and 3, S / 4, either in relation to the horizontal direction (for example, HOR_UP or HOR_DOWN) or vertically (for example, VER_LEFT, or VER_RIGHT). In JVET-D0064 the width or height of CU recently added could only be 12 or 24.
[0117] Direct and reverse transform techniques will now be discussed. In image / video encoding, transforms are applied mainly to the 2-D input data sources. Exemplary methods of applying transform to 2-D input data include separable and non-separable 2-D transforms. Separable 2-D transforms are normally used since separable transforms require less operation counts (additions and multiplications) as compared to non-separable 2-D transforms.
[0118] In one example, variable X is an input data matrix WxH, where W is the width of the matrix and H is the height of the matrix. An example of a separable 2-D transform applies 1-D transforms to the horizontal and vertical vectors of X sequentially, formulated as follows:
Y = CXR ^T where Y is the transformed matrix of X, and where C and R denote transform matrices WxW and HxH, respectively, which could be either in terms of integer precision or double precision. T, (as in
Petition 870190061136, of 7/1/2019, p. 63/139
57/107 equation (1) below), represents the transform matrix in the integer value. From the formulation, it can be seen that C applies vertical 1-D transforms (column, left) to column X vectors, while R applies horizontal 1-D transforms (line, right) to X line vectors.
[0119] In HEVC, W is equal to H (and is equal to S). As such, the data matrix WxH could be denoted by 2 ² * ^k , where K is an integer and S is equal to 2 ^K. Transform matrices (C and R) are generated as T as follows.
T = int (^ S · Μ · 2 ^λ '), (1) where T represents the transform matrix in the integer value, int () is the function to obtain the nearest integer value for each floating point element value of a matrix, S indicates the size of the transform (such as 8-point or 32-point transform), M indicates the unitary matrix for the type of floating-value transform, and 2 ⁿ is a scale factor that controls the accuracy of the matrix of integer transform T (such as N = 6 to be used in HEVC transforms). A complex square matrix U is unitary if its conjugated transpose U * is also inverse (that is, if U * U = UU * = I, where I is the identity matrix) In addition, in HEVC, the transform matrix can be derived from T with some elements to be slightly adjusted by +1 or -1.
[0120] In HEVC, after horizontal and vertical transforms are applied, the output of the transform is enlarged by approximately (Vs 2 ^W ) ² , in comparison with the unitary transform (that is, M). Since S is a power of two in HEVC, the value of (Vs 2 ^W ) ² becomes
Petition 870190061136, of 7/1/2019, p. 64/139
58/107 also a power of 2. Therefore, VS introduced in the transform matrix can be compensated with a bit shift during the transform. Since the HEVC matrices are dimensioned by (Vs 2 ^W ) ² = 2 ^{W +} 2 = 2 2'Suponh.aS = 2 ^K ) compared to an orthonormal DCT transform, and in order to preserve the residual block standard through the two dimensional direct and inverse transforms, additional scale factors fwd_shiftl and fwd_shift2 are applied. Similarly, for inverse transform, scale factors are also used due to the inverse transform matrix dimensioned by yfS ~ 2 ^N. Therefore, to preserve the standard through direct and inverse two-dimensional transforms, the product of the scale factors must be equal to
K (1/2 2) ⁴ = 1/2 2 ^{+ 2} * ⁴ ). For example, in a direct transform in HEVC reference software, video encoder 22, applies an offset after horizontal (fwd_shiftl) and vertical (fwd_shift2) direct transform as follows, respectively, to make sure that after each transform, the output fits 16 bits if D is equal to 8.
fwcLshíftl = log ₂ 5 + D + 6 - r (2) fwd_shift2 = log ₂ S + 6 (3)
For inverse transforms in HEVC reference software, the video decoder 30 applies a vertical (inv_shiftl) and horizontal (inv_shift2) displacement after inverse transformation as follows, respectively, inv_shíftl ~ 7 inv _sh i _ft2 - 5 - D + r (4 ) (5)
Petition 870190061136, of 7/1/2019, p. 65/139
59/107 where D represents the bit depth used to reconstruct the video. The sum of the four displacements is (2 * log ₂ S + 24 = 2 * K + 24). The bit depth can be specified in the SPS. Bit depth values of D = 8 and D = 10 result in 8-bit and 10-bit pixel reconstructions, respectively. The parameter r controls the accuracy of the horizontal direct transform. A higher value of r gives greater accuracy. In some examples, the value of r may be a fixed value of 15 or a maximum of 15 or D + 6 (for example, max (15, D + 6)), depending on the configuration, as specified in the SPS.
[0121] Note that the same value of S is used in the above shiftl and shift2 calculations because only square transforms (horizontal and vertical transform sizes are the same) are used in
HEVC.
[0122] In some examples of video encoding using QTBT partitioning, some new transforms, for example, 8x4 non-square transform. In this case, the output log2 (W) + log2ÇH) is not an even value. In this case, an additional factor y / 2 is introduced, but it cannot be compensated with the single bit shift during the transform. Therefore, it was proposed to absorb the value - / 2 in the quantization process (for example, as opposed to changing the transform matrix), as described in US patent publication No. 20170150176, filed on November 22, 2016 , and US Patent Publication No. 20170150183, filed November 22, 2016.
[0123] In asymmetric coding units (for example, those shown in figure 7), transformed
Petition 870190061136, of 7/1/2019, p. 66/139
60/107 with a size not equal to a power of two can be used, just as it transforms with the size of 12 and 24. In this way, these asymmetric coding units introduce more factors that cannot be easily compensated in the process of transformed. Additional processing may be required to perform the transform or reverse transform on such asymmetric coding units.
[0124] Direct and / or inverse quantization techniques will now be discussed. After the transformation process, which compresses the residual energy to the lower frequency coefficients, the video encoder 22, applies quantization to control the distortion of the residual reconstruction. In the video decoder 30, accordingly, an inverse quantization (dequantization) process is performed before the inverse transformation.
[0125] For direct quantization, in a HEVC reference software, video encoder 22, applies a dead zone scheme plus uniform quantization, as described below, y ^f = sign (y) · (| y | · Q + f · 2 ^qblts ) »qbits (6) where y is the input of the transform coefficient, Q is a scale factor quantization, f is the rounding deviation that controls the size of the dead zone size (as shown in figure 8, in that the dead zone is within the range [- (1 - /) * Δ, (1 - /) * 4]), sign (y) = y> 0 1: - 1, qbits is a displacement parameter, and y 'gives the quantized transform coefficient emitted. All values that fall in the dead zone region will be quantized
Petition 870190061136, of 7/1/2019, p. 67/139
61/107 to be 0. Figure 8 is a conceptual diagram that illustrates a dead zone scheme plus uniform quantization. In one example, the delta value (Δ in figure 8 can be Δ = 2qbits) [0126] In HEVC, for an intra slice, f is 171/512, otherwise, f is 85/512.
[0127] the quantization scale factor Q above and qbits displacement parameters are specified as below,
Q - gj UantScales [QP% 6] (7) where QP is the user-defined quantization parameter, g_quantScales is a constant matrix, as specified in HEVC reference software below.
const Int g__quantScales [SCALINGJLIST__REM__NUM] 26214,23 302,20560,18396,163 84,14564 .í, [0128] In addition, qbits is derived below, qbits = 14 + [ÇP / 6] 4- iTransformShift (8) where iTransformShift = r - D - log ₂ (S), where S is the block size, Der are the same as defined in equations (2) and (5) above.
[0129] For inverse quantization in HEVC, the inverse quantization process is specified as below, y - sign ¹ ) (| y '| DQ + »qbíts ^f (9) where y' represents the input quantized transform coefficient, y is the coefficient of transforming
Petition 870190061136, of 7/1/2019, p. 68/139
62/107 dequantizado, DQ is specified as below (10) where g_invQuantScales is a constant matrix specified as below, const Int gJnvQuantScalesfSCALING „LIST__REM NUM] 40,45,51,57,64,72 [0130] Furthermore, qbitS 'is derivative below gbits' = 6 ~ IQP / 6J ~ íTransformShift where ÍTransformShift = r - D - log ₂ (—S), where S is the block size, Der are as defined in equations (2) and [0131] from the definition above
ÍTransformShift in equation (8), it can be seen that the inverse quantization process depends on the size of the block.
[0132] Separate tree structures for intra slices will now be discussed. In the VCEG COM16-C966 proposal, separate tree structures for intra slices were proposed. To further improve coding performance, especially for chroma components, it was proposed to have different tree structures for luma and chroma components in intra slices. That is, chroma blocks can be divided differently than luma blocks. Intra slices are slices that include intrododed encoding units.
[0133] The following problems are observed with
Petition 870190061136, of 7/1/2019, p. 69/139
63/107 the current proposals for encoding video data according to various QTBT and MTT partitioning structures. When incorporating transforms whose sizes are not a power of 2 in an MTT structure, special processing may be necessary to more efficiently process factors that cannot be compensated with a displacement operation during the transform and / or transform process reverse. When using separate luma and chroma tree structures under an MTT structure, a complicated tree structure for chrominance components may not always be beneficial. When separate luma and chroma trees are extended to inter-frames, the movement information is signaled to both luma and chroma which leads to significant signaling costs. When using separate luma and chroma tree structures, sometimes luma and chroma trees have the same division patterns. In this case, it may not be efficient to signal luma and chroma partitions separately.
[0134] To solve the problems mentioned above, the following techniques are proposed. Video encoder 22 and video decoder 30 can be configured to perform the following techniques in a reciprocal manner. The following detailed techniques can be applied individually. In addition, each of the following techniques can be used in any combination.
[0135] In a first example of the disclosure, when incorporating transforms whose size is not a power of 2, this disclosure proposes to generate transform matrices using a rounded modified S 'instead of the S of
Petition 870190061136, of 7/1/2019, p. 70/139
64/107 true size in equation (1) by video decoder 30 and / or in both video encoder 22 and video decoder 30. Transform that has a size that is not a power of 2 can occur when a block has a shape not square. With these displacements, the dimensioning of the transform, which cannot be compensated for by bit displacement, is absorbed in the transform matrices. Thus, it may not be necessary to change other techniques of transform processing and quantization processing (assuming that problem V2 is treated in the manner discussed above).
[0136] In one aspect of the first example, video encoder 22 and video decoder 30 can be configured to round the true size S to the transform to obtain the modified value S 'is rounding the true size from 5 to a value that is a power of 2. For example, an S value of 12 is rounded to 16, and an S value of 24 is rounded to 32. In general, the value of S 'can be obtained by rounding S up or down or to the nearest power of two.
[0137] In one example, video decoder 30 can be configured to receive an encoded block of video data. In one example, the encoded block of video data includes quantized reverse transform coefficients. In some examples, the video data block may be non-square in shape. The video decoder 30 can be further configured to determine a transform for the encoded block of video data, where the transform has a size S that is not a power
Petition 870190061136, of 7/1/2019, p. 71/139
65/107 out of two. The video decoder 30 can be further configured to round S to a power of two creating a transform with a modified size S '. The video decoder 30 can then apply a reverse transform with the modified size S 'to the encoded block of video data to create residual video data, and decode the residual video data to create decoded block of video data .
[0138] Likewise, video encoder 22 can be configured to receive a block of video data. In some instances, video encoder 22 has partitioned the video data block into a non-square shape. The video encoder 22 can predict (for example, using interpretation and / or intraprediction) the video data block to create residual video data. Video encoder 22 can determine a transform for residual video data, where the transform has a size S that is not a power of two. The video encoder 22 can round S to a power of two creating a transform with a modified size S 'and apply a transform with the modified size S' to the residual video data to create transform coefficients. In some examples, video encoder 22 can also quantize the transform coefficients. The video encoder 22 can then encode (for example, using entropy encoding, such as CABAC), the transform coefficients into an encoded video bit stream.
[0139] In another aspect of the first example, the
Petition 870190061136, of 7/1/2019, p. 72/139
66/107 adaptive displacement in equations (2) and (3) is based on S 'instead of S. For example, modified equations (2)' and (3) 'can be modified as follows:
fwd_shíftl ~ log ₂ S '+ D + 6 - r (2 / fwd_shift2 logj S' + 6 (3) '[0140] In another aspect of the first example, when deriving the integer transform matrix T, the scaling factor (V2-2 ^W ) at the top of a unitary transform as shown in equation (1) is replaced by a predefined fixed value, for example, 256, 512 or 1024. In one example, the right shift operation, such as described in equations (2) and (3) is modified so that the values of fwd_shiftl and / or fwd_shift2 do not depend on S, that is, log2S is removed from equations (2) and / or (3). , S would be independent of the size of the transform, that is, the displacement in equations (2) and (3) is defined as a fixed value, regardless of the size of the transform.
[0141] In another aspect of the first example, for the reverse transform, the initial displacement operation is kept unchanged. That is, the video decoder 30 can be configured to perform an inverse shift, as described in equations (4) and (5).
invshiftl 7 (4) inv _{sh {ft2} = 5-D 4r (5) [0142] In another aspect of the first example, the shift operation to the right, as described in equations (8) and / or (11) is modified so that the values of qbits and / or qbitS 'do not depend on S, that is,,
Petition 870190061136, of 7/1/2019, p. 73/139
67/107 log2S is removed from equations (8) and / or (11). The modified equations (8) 'and (11)' are shown below.
[0143] In an example of the disclosure, when quantizing transform coefficients, video encoder 22 can determine the value of qbits as shown below, qbíts - 14 4 “[ÇP / 6J + iTransformShift (8) 'where iTransformShift = r- D, where Der are the same as defined in equations (2) and (5) above.
[0144] In addition, when performing reverse quantization of the transform coefficients, the video decoder 30 can determine the value of qbitS 'as shown below, qbits ¹ - 6 “[QP / 6J - iTransformShift (II)' where iTransformShift = r - D, where Der are as defined in equations (2) and (5).
[0145] In another aspect of the first example, the shift operation to the right, as described in equations (4) and (5) is modified in a way that the values of inv_shiftl and / or inv_shift2 depend on S, such as lnv_shíftl = 7 + log ₂ S (4) '' ínv _shift2 ~ 5 - D + r + log ₂ 5 (5) ”[0146] In a second example of the description, when partitioning blocks according to an MTT structure, such as a structure that uses types of binary tree, symmetric center-lateral triple tree, quadtree and asymmetric tree, the video encoder 22 and the video decoder
Petition 870190061136, of 7/1/2019, p. 74/139
68/107 video 30 can be configured to partition blocks using a two-tier MTT. Exemplary two-tier MTT structures are described in US Patent Publication No. US 20170272782, filed on March 20, 2017. In two-tier MTT, at a first level (referred to as a tree region level), an image or Video data block is divided into regions, each with one or more types of trees that are capable of partitioning a large block into small blocks quickly (for example, using a tree or hexadecimal quadtree). At the second level (referred to as a prediction level), a region is further divided with MTT techniques (not including any additional divisions). The leaf node of a prediction tree is referred to in this disclosure as a coding unit (CU). In addition, the following techniques can be used.
[0147] In one aspect of the second example, for a prediction tree that has been flagged as additionally split, video encoder 22 can flag a first flag to indicate a vertical or horizontal division. The video encoder 22 can then flag a flag to indicate whether such a division is a symmetric division (for example, binary tree or symmetric center-lateral triple tree). If the division is a symmetric division, the video encoder 22 can signal the type index of the various types of symmetric partition desired, such as binary tree or symmetrical center-side triple tree. Otherwise (for example, the division is an asymmetric division), video encoder 22 can flag a flag to indicate whether the division
Petition 870190061136, of 7/1/2019, p. 75/139
69/107 asymmetric is a division up or down (for example, when the division is a horizontal division), or to indicate whether the division is a division to the left or to the right (for example, when the division is a vertical division ). The video encoder 30 can be configured to receive and analyze the above mentioned partition video blocks accordingly.
[0148] In another aspect of the second example, for a prediction tree that has been flagged as being further split, video encoder 22 may signal a first flag to indicate a vertical or horizontal division. Then, video encoder 22 can flag a flag to indicate whether the split is a type of binary tree partition. If the split is not a binary tree partition type, video encoder 22 can signal the type index of other types of trees, such as symmetrical center-side triple tree or asymmetric tree. If the split is an asymmetric tree, video encoder 22 can flag a flag to indicate that the split is an up or down split (for example, when the split is a horizontal split), or to indicate that the split is a split to the left or to the right (for example, when the split is a vertical split). The video decoder 30 can be configured to receive and analyze the above mentioned flags and partition video blocks accordingly.
[0149] In another aspect of the second example, for a prediction tree that has been flagged as additionally split, video encoder 22 can
Petition 870190061136, of 7/1/2019, p. 76/139
70/107 signal a flag to indicate a vertical or horizontal division. The video encoder 22 can then signal a flag to indicate whether the division is symmetrical center-side triple tree. If the split is not a symmetrical center-lateral triple tree partition type, video encoder 22 can signal the type index of other types of trees, such as binary tree or asymmetric tree. If the split is an asymmetric tree, video encoder 22 can flag a flag to indicate whether the asymmetric tree split is an up or down split (for example, when the split is a horizontal split), or to indicate whether the asymmetric tree split is a split to the left or right (for example, when the split is a vertical split).
[0150] In another aspect of the disclosure, the video encoder 22 can be configured to adaptively change the order to signal elements of syntax that indicate vertical / horizontal division, and / or up / down division, and / or a left / right, and / or tree-type partition types according to the characteristics of the encoded information, such as the associated partition types / indicators of neighboring blocks, or the slice / image types. In one example, different slices / images can use different orders of signaled syntax elements to indicate partitioning (for example, how the blocks are divided). In another example, video encoder 22 can be configured to change the order of the syntax elements per block. Video decoder 30 can be configured to receive the above syntax elements
Petition 870190061136, of 7/1/2019, p. 77/139
71/107 mentioned in the same order determined by the video encoder 22. The video decoder 30 can determine the order of the syntax elements in the same way as the video encoder 22.
[0151] In another aspect of the second example, the context used to entropy code a syntax element indicating an asymmetric tree type partition (for example, upper / inferror or left / right partitions) can be derived as follows. Figure 9 shows examples of asymmetric partition types in this disclosure example. Let A, B, C be the block sizes of the blocks that cover the locations immediately above the central positions of each of the partitions illustrated in figure 9 (top left). In this example, video encoder 22 and video decoder 30 can determine the index of the context model using a counter, which is initialized to zero. The counter value is used to determine the context.
[0152] Consider the following conditions:
Condition 1: If A is not equal to B and B is equal to C.
Condition 2: If the CU above is an asymmetric tree vertically partitioned with the border in the left half.
[0153] In one example, if it is found that a condition of one or two is satisfied, video encoder 22 and video decoder 30 can be configured to increment the counter to one. In another example, if both conditions 1 and 2 are met, video encoder 22 and video decoder 30 can be configured to increment the counter to one. In another example, the
Petition 870190061136, of 7/1/2019, p. 78/139
72/107 video encoder 22 and video decoder 30 can be configured to increment the counter to one, if condition 1 is satisfied. Likewise, video encoder 22 and video decoder 30 can be configured to increment the counter to one, if condition 2 is satisfied.
[0154] In another example, A, B, C may be in other positions in relation to partitions. In one example, A, B, C are the block sizes of blocks covering locations immediately above the upper left corner of each of the partitions shown in figure 9 (top middle). In another example, let A, B, C be the block sizes of the blocks that cover the locations immediately above the upper right corner of each of the partitions shown in figure 9 (upper right).
[0155] In another example, let D, E, F be the block size of the blocks that cover the locations immediately to the left of the central positions of each of the partitions illustrated in figure 9 (bottom left).
[0156] In this example, consider the following conditions:
Condition 1: If D is not equal to E and E is equal to F.
Condition 2: If the left CU is an asymmetric tree horizontally partitioned with the border in the upper half.
[0157] In one example, if condition 1 or 2 is found to be met, video encoder 22 and video decoder 30 can be configured to increment the counter to one. In another example, if both conditions 1 and 2 are met, the
Petition 870190061136, of 7/1/2019, p. 79/139
73/107 video 22 and video decoder 30 can be configured to increment the counter to one. In another example, video encoder 22 and video decoder 30 can be configured to increment the counter to one, if condition 1 is satisfied. Likewise, video encoder 22 and video decoder 30 can be configured to increment the counter to one, if condition 2 is satisfied.
[0158] In another example, D, E, F can be in other positions. In one example, D, E, F are the block sizes of the blocks that cover the locations immediately to the left of the upper left corner of each of the partitions shown in figure 9 (lower middle). In another example, let D, E, F be the block size of the blocks that cover the locations immediately to the left of the lower left corner of each of the partitions shown in figure 9 (lower right).
[0159] In another example, video encoder 22 and video decoder 30 can be configured to determine A-F locations in a predefined manner. In another example, video encoder 22 can be configured to signal the location of A-F blocks in the SPS, PPS, or in the slice header.
[0160] In another aspect of the second example, video encoder 22 and video decoder 30 can be configured to derive the context of the flag used to signal tree symmetricality as follows. In one example, a single context model can be used. In another example, video encoder 22 and video decoder 30 can be configured to use a model
Petition 870190061136, of 7/1/2019, p. 80/139
74/107 multi-level context based on a counter. In one example, the initial value of the counter is zero. If the CU above is an asymmetric block, video encoder 22 and video decoder 30 can be configured to increment the counter by one. If the left CU is an asymmetric block, video encoder 22 and video decoder 30 can be configured to increment the counter by one. If the CU above on the left is an asymmetric block, video encoder 22 and video decoder 30 can be configured to increment the counter by one. If the CU above right is an asymmetric block, the video encoder 22 and the video decoder 30 can be configured to increment the counter by one. If none of the previous four blocks belong to the asymmetric block, video encoder 22 and video decoder 30 can be configured to set the counter to be 5.
[0161] In another aspect of the second example, video encoder 22 can be configured to conditionally generate and signal a flag to indicate whether a prediction tree is further divided. When the size of the prediction tree, without further division, uses a transform whose size is not supported, the video encoder 22 can be configured not to signal the division flag. Instead, based on the condition that the size of the prediction tree, without further division, uses a transform whose size is not supported, both video encoder 22 and video decoder 30 can be configured to infer that the prediction must be additionally divided.
[0162] In another aspect of the second example, in a
Petition 870190061136, of 7/1/2019, p. 81/139
75/107 video coding system with variable prediction tree depth (PT) based on the region tree (RT), the minimally permissible block sizes can be RT dependent. As such, signaling the PT division flag, division direction flag (horizontal / vertical), tree symmetrical flag, or other tree type flags mentioned above can be avoided. In addition, a similar restriction can be imposed on CTUs lying on the board's borders if the CTU is allowed to be divided with certain or all types of PT.
[0163] In an example of the disclosure, video encoder 22 can be configured to partition video data using a two-tier multi-type tree partitioning structure, and generate syntax elements indicating as a structure prediction tree two-level multi-type tree partitioning is structure, the syntax elements including one or more of a flag to indicate a vertical or horizontal division, a flag to indicate whether the division is a symmetric division, a type index, or a flag to indicate whether an asymmetric division is an up or down division or a left or right division. In one example, video encoder 22 can be configured to determine contexts for syntax elements based on block sizes of neighboring blocks. In another example of the disclosure, video encoder 22 can be configured to determine contexts for the syntax elements based on the partition types of neighboring blocks.
[0164] In a third example of the description, when using a separate tree structure for the
Petition 870190061136, of 7/1/2019, p. 82/139
76/107 luma and chroma components (for example, luma blocks and chroma blocks are divided separately), video encoder 22 can be configured to generate and signal syntax elements that indicate the types of trees allowed for luma components and chroma separately. That is, video encoder 22 can generate separate syntax elements indicating types of trees allowed for both luma and chroma blocks. The values of the syntax elements can indicate which of the two or more types of trees are allowed for a particular luminance or chrominance block. Examples of tree types can be any of the tree types discussed above, including symmetric and asymmetric binary tree types, quadtree tree types, and symmetric and asymmetric triple tree types. The video encoder 30 can be configured to analyze the syntax elements indicating the types of trees allowed.
[0165] The video encoder 22 can also be configured to signal additional syntax elements that indicate which of the types of trees allowed to be used for a specific block. The video decoder can be configured to interpret the additional syntax elements and determine how to partition a particular block from the syntax elements that indicate the types of trees allowed and the additional syntax elements that indicate the specific type of tree to use, among the types of trees allowed.
[0166] In one example, video encoder 22 can signal the types of trees allowed for
Petition 870190061136, of 7/1/2019, p. 83/139
77/107 luma and chroma components separately, in a video parameter set (VPS), a sequence parameter set (SPS), an image parameter set (PPS), an adaptation parameter set (APS ), or any other string / image / slice level syntax element body. In one example, tree types can include at least two binary tree, symmetrical center-side triple tree, quadtree asymmetric tree types. In another example, the binary and / or symmetric center-lateral triple tree can always be linked (for example, always allowed), while the symmetric and / or asymmetrical center-side triple tree / CU partition types are optional and are signaled in the bit stream.
[0167] In a fourth example of the description, when extending the luminance / chrominance tree structure separated into inter slices, the video encoder 22 can be configured to signal the motion information once, just as for the luminance (also called a primary tree). The video decoder 30 can then be configured to inherit (for example, reuse) motion information for other tree structures (for example, for secondary tree structures for chroma blocks) for blocks in a colocalized position for the blocks of luma. In the event that a colocalized chrominance block is larger than a single colocalized luma block, the video decoder 30 can be configured to reuse motion information for all colocalized luma blocks, that is, a chroma can include multiple
Petition 870190061136, of 7/1/2019, p. 84/139
78/107 types of movement information from all colocalized luma blocks. In another example, the video decoder 30 can be configured to use the movement information of the primary tree as a predictor for the movement information of other trees (for example, the tree structure for chroma blocks).
[0168] In general, according to this disclosure example, the video decoder 30 can be configured to partition a block of video data luma, and to partition one or more chroma blocks of video data separately from the block partitioning of video data luma. The partition structure for the video data light block can be a multi-type tree partition structure, and the partition structure for the one or more video data chroma blocks can also be the video partition structure. multi-type tree. The video decoder 30 can be additionally configured for an inter slice, determine the motion information of the video data block, and infer the motion information for the one or more chroma blocks of video data from the motion information determined for the video data block.
[0169] In a fifth example of the disclosure, in separate tree coding (for example, luma and chroma blocks are divided separately, with potentially different partitioning), the tree division pattern of a chroma tree (can be called a secondary tree) is inherited from the luminance tree (can be called a primary tree) when the colocalized block in the primary tree contains both blocks
Petition 870190061136, of 7/1/2019, p. 85/139
79/107 coded intra and inter. That is, video encoder 22 and video decoder 30 can be configured to partition chroma blocks using the same division pattern as associated luma blocks, when the luma blocks include both intra and intercodified blocks.
[0170] In this example, video encoder 22 can be configured to only signal a split tree pattern to the secondary tree only when the colocalized block in the primary tree (for example, the luma block) contains only blocks of the same type . That is, the colocalized luma blocks include both all intercoded blocks or both intracoded blocks. In one example, the split tree pattern includes, but is not limited to, a type of tree (non-split can also be considered as a special type of tree). Tree types can include both symmetrical and asymmetric tree types, such as the binary tree, triple tree, and quadtree.
[0171] In general, according to this disclosure example, the video decoder 30 can be configured to partition a block of video data luma, infer the partitioning for one or more chroma blocks of video data to be the same as for the luma block of video data in a case where the luma block includes both intrododed blocks, and intercodified blocks, and determine, from signaled syntax elements, the partitioning for the one or more blocks of video data chroma when the luma blocks all include the same type of encoded blocks.
[0172] In a sixth example of disclosure, in order
Petition 870190061136, of 7/1/2019, p. 86/139
80/107 to reduce the cost of signaling a PT split flag, the video encoder 22 and the video decoder 30 can be configured to determine the contexts of the PT split flag using the relative sizes of the neighboring blocks for those the current block.
[0173] In one example, video encoder 22 and video decoder 30 can be configured to select the context for encoding a PT-type tree using the size of the blocks in relation to their neighboring blocks. In one example, when the width of the current block is greater than the width of its neighbor above, it is more likely for the current block to be further divided. Likewise, when the height of the current block is greater than the height of its left neighbor, it is more likely for the current block to be further divided. In addition, the relative size of the upper left, upper right, and lower left neighbors for the size of the current block also provides useful information for determining whether the current block should be further divided. If the size of the current block is larger than the size of the blocks of its neighbors, it is also more likely for the current block to be further divided. The video encoder 22 and video decoder 30 can be configured to use an aggregate number of event occurrences mentioned above as an index to the context for the PT split flag. In addition, an individual event can also form a set of contexts for the PT division flag.
[0174] In one example, video encoder 22 and video decoder 30 can be configured to
Petition 870190061136, of 7/1/2019, p. 87/139
81/107 use the width of the upper neighboring block and the height of the left neighboring block to determine a context for the PT division direction (for example, horizontal division or vertical division). If the width of the upper neighboring block is less than that of the current block and the height of the left neighboring block is greater than or equal to that of the current block, the current block is more likely to be divided vertically. Likewise, if the height of the left neighboring block is less than that of the current block and the width of the upper neighboring block is greater than or equal to that of the current block, it is more likely for the current block to be divided horizontally.
[0175] In another example, video encoder 22 and video decoder 30 can be configured to use the width of the upper neighbor block and the height of the left neighbor block to determine a context for a PT split mode (for example , a determination between division modes, such as between binary tree and triple center-lateral tree). If the width of the upper neighboring block is less than that of the current block and the current block is vertically divided, or, if the height of the left neighboring block is less than that of the current block and the current block is divided horizontally, it is more likely for the current block to be divided as a triple tree.
[0176] In an example, if neighboring blocks are not available, video encoder 22 and video decoder 30 can be configured to use a default context value when deriving the current context. In another example, when different divisions of RT or PT are authorized for different
Petition 870190061136, of 7/1/2019, p. 88/139
82/107 components (such as Y, Cb, Cr or depth components), all the methods mentioned above can be applied, using the associated blocks in the other components as the neighboring blocks.
[0177] In another example, video encoder 22 and video decoder 30 can be configured to use the relative depth of neighboring blocks, so that the current block determines the context of the PT division syntax element.
[0178] In another example, for one side of a block with a width / height equal to three times the minimum block size, the video encoder 22 and video decoder 30 can be configured to use the size
block or depth Location PT central From neighboring blocks higher or left to to derive the contexts. [0179] On a Another example, When The width / height of a block is equal to three times the size
minimum block, video encoder 22 and video decoder 30 can be configured to use the average value of the block sizes or PT depth of the three neighboring blocks to derive the contexts.
[0180] In another example, when the width / height of a block is equal to three times the minimum block size, the video encoder 22 and video decoder 30 can be configured to use the maximum value of the block size or PT depth of the three neighboring blocks to derive the contexts.
[0181] In another example, when the width / height of a block is equal to three times the size
Petition 870190061136, of 7/1/2019, p. 89/139
83/107 of the minimum block, video encoder 22 and ο video decoder 30 can be configured to use the minimum value of the block size or PT depth of the three neighboring blocks to derive the contexts.
[0182] In another example, when the width / height of a block is equal to three times the minimum block size, video encoder 22 and video decoder 30 can be configured to use the median value of the block size or PT depth of the three neighboring blocks can derive the contexts.
[0183] In another example, the aggregate value of the block sizes or PT depth (as described above) is a representation of the size of the neighboring block. In an example, if the width of the upper neighbor is less than the width of the current block, context 1 is used. Otherwise, context 0 is used. Likewise, if the height of the left neighbor is less than the height of the current block, context 1 is used. Otherwise, context 0 is used. In one example, the aggregate value of the block sizes or PT depth can be used to increment a counter that is equal to or controls the index of the context model to be used. In addition, the added value can be linked to the drawing in the equation below as a substitute for singularly positioned values. For each of the neighboring blocks, the following context establishment process (CTX) can be performed in order. The sum of CTX can be used to select the context index. In another example, the first two equations below are first applied to each of the neighboring blocks, and then the last equation is used to select the
Petition 870190061136, of 7/1/2019, p. 90/139
84/107 context index, where the CTX entry is the sum of CTX of all neighboring blocks.
CTX (IV> M / _n ) + (H> fi _b2 ) + (IV * H> STL) + (IV * H> STR)
CTX = ((IV <lV _r2 ) && (H <H _L2 ) && (IV * f <STL) && (W * H <STfí)) lQ: CTX CTX = (CTX> 3) 3: CTX [0184] In general, according to this disclosure example, video encoder 22 and video decoder 30 can be configured to determine a context for a current block's split flag based on the relative sizes of the neighboring blocks for the current block, and context encoding the division flag based on the given context.
[0185] In a seventh example of the disclosure, in any tree structure framework, such as MTT, the video encoder 22 and / or video decoder 30 can be configured to apply transforms based on the block size. In some examples, certain block sizes may have no related transforms (that is, no transforms of the same size supported). For example, video encoder 22 and / or video decoder 30 can be configured to split a 64x48 CU, but it cannot be configured to use a 48x48 transform. In another example, video encoder 22 and / or video decoder 30 can be configured to split a 256x256 CU, but the largest supported transform is only 128x128.
[0186] In these examples, video encoder 22 and / or video decoder 30 can be configured to allow only certain encoding modes for such CUs.
Petition 870190061136, of 7/1/2019, p. 91/139
85/107
In one example, video encoder 22 and / or video decoder 30 can only be configured to use jump mode for a CU if there is a transform unrelated to the CU. In this example, a related transform is a transform with a size equal to at least one dimension of the CU. In this example, video encoder 22 cannot flag an ignore flag. Instead, the video decoder 30 can be configured to infer that the jump flag value is true based on the CU being of a size without related transforms.
[0187] In another example, video encoder 22 and / or video decoder 30 can be configured to disable a CU from having any non-null residuals (residual values), if there is no supported transform for the CU size. In this example, the video encoder cannot be configured to signal an encoded block (CBF) flag. A coded block flag is a flag that indicates whether or not a block includes any non-zero transform coefficients. Since, in this example, a CU may not have any residual values other than zero if the CU does not have a supported transform, the video decoder 30 may indicate that the CBF flag is zero (that is, there are no transform coefficients) different from zero).
[0188] In one example, video encoder 22 and video decoder 30 can be configured to determine an encoding mode for a video data block based on the size of the video data block. In particular, video encoder 22 and video decoder 30 can be configured to determine a
Petition 870190061136, of 7/1/2019, p. 92/139
86/107 encoding for the video data block based on the size of the video data block and transforms that are supported by the video encoder 22 and video decoder 30. If there is no supported transform related to the size of the data block video, the video encoder 22 and the video decoder 30 can be configured to determine certain predetermined encoding modes for the video data block. In one example, the encoding mode can be the ignore mode. In another example of the disclosure, the encoding mode may be another encoding mode (for example, the mode, the AMVP mode, intra mode), but the CBF flag is inferred to be zero.
[0189] In general, video encoder 22 and video decoder 30 can be configured to determine whether a block has an associated transform based on the block size, and reset the encoding mode for the block if the block does not have an associated transform.
[0190] Figure 10 is a block diagram illustrating an exemplary video encoder 22 that can implement the techniques of this disclosure. Figure 10 is provided for the purpose of explanation and should not be considered as limiting the techniques as widely exemplified and described in this disclosure. The techniques of this disclosure may be applicable to different standards or coding methods.
[0191] In the example of figure 10, a video encoder 22 includes a prediction processing unit 100, the video data memory 101, a residual generation unit 102, a video processing unit
Petition 870190061136, of 7/1/2019, p. 93/139
87/107 transformed 104, a quantization unit 106, an inverse quantization unit 108, an inverse transform processing unit 110, a reconstruction unit 112, a filter unit 114, a decoded image buffer 116, and a unit entropy coding unit 118. The prediction processing unit 100 includes an interpreting processing unit 120 and an intraprediction processing unit 126. Interpretation processing unit 120 may include a motion estimation unit and a motion compensation unit movement (not shown).
[0192] Video data memory 101 can be configured to store video data to be encoded by video encoder 22 components. Video data stored in video data memory
101 can be obtained, for example, from video source 18. The decoded image buffer 116 can be a reference image memory, which stores reference video data for use in encoding video data by a video encoder. video 22, for example, in intra or intercoding modes. The video data memory 101 and the decoded image buffer 116 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), including synchronous DRAM (SDRAM), magnetoresistive RAM ( MRAM), resistive RAM (RRAM), or other types of memory devices. The video data memory 101 and the decoded image buffer 116 can be provided by the same memory device or separate memory devices. In several examples, data memory
Petition 870190061136, of 7/1/2019, p. 94/139
88/107 video 101 can be on a chip with other components of a video encoder 22, or on the outside of the chip in relation to those components. Video data memory 101 can be the same as or part of storage media 20 of Figure 1.
[0193] Video encoder 22 receives video data. Video encoder 22 can encode each CTU into a slice of an image of the video data. Each of the CTUs can be associated with luma coding tree blocks (CTBS) of equal size and corresponding CTBs of the image. As part of encoding a CTU, prediction processing unit 100 can perform partitioning to divide CTU CTBs into progressively smaller blocks. The smallest blocks can be CU coding blocks. For example, the prediction processing unit 100 can divide a CTB associated with a CTU according to a tree structure. In accordance with one or more techniques of this disclosure, for each respective non-leaf node of the tree structure, at each level of the depth of the tree structure, There are a plurality of division patterns allowed for the respective non-leaf node and the block of video corresponding to the respective non-leaf node is divided into video blocks that correspond to the child nodes of the respective non-leaf node according to one of the plurality of allowed division patterns. In one example, the prediction processing unit 100 or other processing unit of the video encoder 22 can be configured to perform any combination of the MTT partitioning techniques described above.
Petition 870190061136, of 7/1/2019, p. 95/139
89/107 [0194] Video encoder 22 can encode CUs from a CTU to generate encoded representations of the CUs (i.e., encoded CUs). As part of coding a CU, the prediction processing unit 100 can divide the coding blocks associated with the CU between one or more PUs of the CU. According to the techniques of this disclosure, a CU can include only a single PU. That is, in some examples of this disclosure, a CU is not divided into separate prediction blocks, but instead, a prediction process is performed across the CU. Thus, each CU can be associated with a luma prediction block and corresponding chroma prediction blocks. Video encoder 22 and video decoder 30 can support CUs having various sizes. As indicated above, the size of a CU can refer to the size of the CU's luma coding block as well as the size of a luma prediction block. As discussed above, video encoder 22 and video decoder 30 can support CU sizes defined by any combination of the exemplary MTT partition types described above.
[0195] Interpretation processing unit 120 can generate prediction data for a PU by performing interpreting on each PU in a CU. As explained above, in some MTT examples of this disclosure a CU can contain only a single PU, that is, CU and PU can be synonymous. Predictive data for PU can include predictive blocks for PU and motion information for PU. Interpretation processing unit 120 can perform different operations for a PU or CU depending on whether the PU is a slice I,
Petition 870190061136, of 7/1/2019, p. 96/139
90/107 a slice P or a slice B. In a slice I, all PUs are intrapredicted. Therefore, if the PU is a slice I, the interpreting processing unit 120 does not perform interpreting on the PU. Thus, for blocks encoded in mode I, the predicted block is formed using spatial prediction from neighboring blocks previously encoded within the same frame. If a PU is a P slice, interpreting processing unit 120 can use unidirectional interpreting to generate a PU predictive block. If a PU is a B slice, Interpretation Processing Unit 120 can use unidirectional or bidirectional interpretation to generate a PU predictive block.
[0196] The intraprediction processing unit 126 can generate prediction data for a PU when performing intraprediction on the PU. Predictive data for PU can include predictive blocks of PU and various syntax elements. The intraprediction processing unit 126 can perform the intraprediction on PUs in slices I, slices P, and slices B.
[0197] To perform intraprediction on a PU, intraprediction processing unit 126 can use multiple intraprediction modes to generate multiple predictive data sets for the PU. Intrapredictive processing unit 126 can use sample blocks from neighboring PU samples to generate a predictive block for a PU. The neighboring PU can be above, above and to the right, above and to the left, or to the left of the PU, assuming an encoding order from left to right, top to bottom for PUs, CUs, and CTUs. THE
Petition 870190061136, of 7/1/2019, p. 97/139
91/107 intraprediction processing unit 126 can use various numbers of intraprediction modes, for example, 33 directional intraprediction modes. In some instances, the number of intrapredictive modes may depend on the size of the region associated with the PU.
[0198] The prediction processing unit 100 can select the predictive data for PUs from a CU from the predictive data generated by the interpretation processing unit 120 for the PU or the predictive data generated by the intraprediction processing unit 126 for the PUs . In some examples, the prediction processing unit 100 selects the predictive data for the PU of the CU based on the metrics / distortion rate of the predictive data sets. The predictive blocks of the selected predictive data can be referred to here as the selected predictive blocks.
[0199] Residual generation unit 102 can generate, based on the coding blocks (for example, luma coding blocks, Cb and Cr) for a CU and the selected predictive blocks (for example, luma predictive blocks, Cb and Cr) for the PU of CU, residual blocks (for example, residual luma blocks, Cb and Cr) for CU. For example, the residual generation unit 102 can generate the residual CU blocks in such a way that each sample in the residual blocks has a value equal to a difference between a sample in a CU coding block and a corresponding sample in a predictive block corresponding selected from a CU PU.
[0200] Transform processing unit 104 can perform quadtree partitioning for
Petition 870190061136, of 7/1/2019, p. 98/139
92/107 partition the residual blocks associated with a CU into transform blocks associated with CU's TUs. Thus, a TU can be associated with a luma transform block and two chroma transform blocks. The sizes and positions of the CU luma and chroma transform blocks of a CU may or may not be calculated based on the sizes and positions of the CU PU prediction blocks. A quadtree structure known as a residual quadtree (RQT) can include the nodes associated with each of the regions. The CU's TUs can correspond to RQT leaf nodes. In other examples, transform processing unit 104 can be configured to partition TUs according to the MTT techniques described above. For example, video encoder 22 cannot additionally divide CUs into TUs using an RQT structure. As such, in one example, a CU includes a single TU.
[0201] Transform processing unit 104 can generate transform coefficient blocks for each CU of a CU by applying one or more transformed to the transform blocks of the CU. Transform processing unit 104 can apply multiple transforms to a transform block associated with a TU. For example, transform processing unit 104 can apply a discrete cosine transform (DCT), a directional transform, or a transform conceptually similar to a transform block. In some examples, transform processing unit 104 does not apply a transform to a transform block. In such examples, the transform block can be treated as a transform coefficient block.
Petition 870190061136, of 7/1/2019, p. 99/139
93/107 [0202] The quantization unit 106 can quantize the transform coefficients in a block of coefficients. The quantization process can reduce the bit depth associated with some or all of the transform coefficients. For example, a n-bit transform coefficient can be rounded down to a m-bit transform coefficient during quantization, where n is greater than m. The quantization unit 106 can quantize a block of coefficients associated with a CU's TU based on a quantization parameter (QP) value associated with the CU. The video encoder 22 can adjust the degree of quantization applied to the coefficient blocks associated with a CU, by adjusting the QP value associated with the CU. Quantization can introduce loss of information. Thus, quantized transform coefficients may be less accurate than the original ones.
[0203] The inverse quantization unit 108 and inverse transform processing unit 110 can apply inverse quantization and inverse transform to a block of coefficients, respectively, to reconstruct a residual block of the coefficient block. The reconstruction unit 112 can add the reconstructed residual block to corresponding samples of one or more prediction blocks generated by the prediction processing unit 100 to produce a reconstructed transform block associated with a TU. By reconstructing the transform blocks for each CU of a CU in this way, the video encoder 22 can reconstruct the CU coding blocks.
Petition 870190061136, of 7/1/2019, p. 100/139
94/107 [0204] Filter unit 114 can perform one or more unlocking operations to reduce blocking artifacts in the coding blocks associated with a CU. Decoded image buffer 116 can store the reconstructed coding blocks after the filter unit 114 performs one or more unlock operations on the reconstructed coding blocks. Interpretation processing unit 120 can use a reference image containing the reconstructed coding blocks to perform interpreting on PUs of other images. In addition, the intraprediction processing unit 126 can use reconstructed encoding blocks in the decoded image buffer 116 to perform intraprediction over other PUs in the same image as the CU.
[0205] Entropy coding unit 118 can receive data from other functional components of video encoder 22. For example, entropy coding unit 118 can receive the coefficient blocks of quantization unit 106 and can receive syntax elements of the prediction processing unit 100. The entropy coding unit 118 can perform one or more entropy coding operations on the data to generate entropy encoded data. For example, the entropy encoding unit 118 can perform a CABAC operation, a context adaptive variable length encoding operation (CAVLC), a variable length variable encoding operation (V2V), a binary arithmetic encoding operation adaptive syntax-based context (SBAC)
Petition 870190061136, of 7/1/2019, p. 101/139
95/107 operation, a probability interval partitioning (PIPE) entropy coding operation, an exponential Golomb coding operation, or other type of entropy coding operation on the data. Video encoder 22 can produce a bit stream that includes entropy-encoded data generated by entropy encoding unit 118. For example, the bit stream can include data that represents the partition structure for a CU according to the techniques of this disclosure.
[0206] Figure 11 is a block diagram illustrating an exemplary video decoder 30 that is configured to implement the techniques of this disclosure. Figure 11 is provided for the purpose of explanation and is not limiting on the techniques as widely exemplified and described in this disclosure. For purposes of explanation, this disclosure describes video decoder 30 in the context of HEVC encoding. However, the techniques of this disclosure can be applied to other standards or coding methods.
[0207] In the example of figure 11, the video decoder 30 includes an entropy decoding unit 150, video data memory 151, a prediction processing unit 152, an inverse quantization unit 154, a data processing unit inverse transformed 156, a reconstruction unit 158, a filter unit 160, and a decoded image buffer 162. The prediction processing unit 152 includes a motion compensation unit 164 and an intraprediction processing unit 166. In others examples, the
Petition 870190061136, of 7/1/2019, p. 102/139
96/107 video decoder 30 can include more, less or different functional components.
[0208] The video data memory 151 can store encoded video data, such as an encoded video bit stream, to be decoded by the video decoder components 30. The video data stored in the video data memory 151 they can be obtained, for example, from a computer-readable medium 16, for example, from a local video source, such as a camera, through wired or wireless network communication of video data, or by accessing physical data storage media. Video data memory 151 can form an encoded image buffer (CPB) that stores the encoded video data from an encoded video bit stream. Decoded image buffer 162 can be a reference image memory, which stores reference video data for use in decoding video data by video decoder 30, for example, in intra or intercoding modes, or for transmission. The video data memory 151 and the decoded image buffer 162 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), including synchronous DRAM (SDRAM), magnetoresistive RAM ( MRAM), resistive RAM (RRAM), or other types of memory devices. The video data memory 151 and the decoded image buffer 162 can be provided by the same memory device or separate memory devices. In several examples, video data memory 151 can be on chip with other components of the
Petition 870190061136, of 7/1/2019, p. 103/139
97/107 video decoder 30, or outside the chip in relation to these components. Video data memory 151 can be the same as or part of storage media 28 of Fig. 1.
[0209] Video data memory 151 receives and stores encoded video data (for example, NAL units) from a bit stream. Entropy decoding unit 150 can receive encoded video data (e.g., NAL units) from video data memory 151 and can analyze NAL units for syntax elements. Entropy decoding unit 150 can entropy decode entropy-encoded syntax elements of the NAL units. Prediction processing unit 152, inverse quantization unit 154, inverse transform processing unit 156, reconstruction unit 158, and filter unit 160 can generate decoded video data based on the syntax elements extracted from the bit stream . Entropy decoding unit 150 can perform a process generally reciprocal to that of the entropy coding unit 118.
[0210] According to some examples of this disclosure, the entropy decoding unit 150, or another processing unit of the video decoder 30, can determine a tree structure as part of obtaining the bit stream syntax elements. The tree structure can specify how an initial video block, such as a CTB, is divided into smaller video blocks, such as encoding units. In accordance with one or more techniques of this disclosure, for each respective node
Petition 870190061136, of 7/1/2019, p. 104/139
98/107 non-leaf of the tree structure at each level of the depth of the tree structure, there are a plurality of partition types allowed for the respective non-leaf node and the video block corresponding to the respective non-leaf node is divided into blocks of video that correspond to the child nodes of the respective non-leaf node according to one of the plurality of allowed division patterns.
[0211] In addition to obtaining bitstream syntax elements, the video decoder 30 can perform a reconstruction operation for a non-partitioned CU. In order to perform the reconstruction operation of a CU, the video decoder 30 can perform a reconstruction operation on each CU of the CU. When performing the reconstruction operation for each CU of the CU, the video decoder 30 can reconstruct residual CU blocks. As discussed above, in an example of the invention, a CU includes a single TU.
[0212] As part of performing a reconstruction operation on a CU's TU, inverse quantization unit 154 can inversely quantize, that is, de-quantify, blocks of coefficients associated with the TU. After inverse quantization unit 154 inversely quantizing a block of coefficients, inverse transform processing unit 156 can apply one or more inverse transforms to the coefficient block in order to generate a residual block associated with the TU. For example, inverse transform processing unit 156 can apply a discrete inverse cosine transform, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), a rotational transform
Petition 870190061136, of 7/1/2019, p. 105/139
99/107 inverse, an inverse directional transform, or another inverse transform to the block of coefficients.
[0213] If a CU or PU is coded using intraprediction, intraprediction processing unit 166 can perform intraprediction to generate PU predictive blocks. Intrapredicting processing unit 166 can use an intrapredicting mode to generate PU predictive blocks based on samples from spatially neighboring blocks. Intraprediction processing unit 166 can determine the intraprediction mode for the PU based on one or more elements of syntax obtained from the bit stream.
[0214] If a PU is encoded using interpredition, entropy decoding unit 150 can determine the movement information for the PU. Motion compensation unit 164 can determine, based on PU movement information, one or more reference blocks. Motion compensation unit 164 can generate, based on one or more reference blocks, predictive blocks (for example, predictive luma blocks, Cb and Cr) for the PU. As discussed above, in an example of the disclosure using MTT partitioning, a CU can include only a single PU. That is, a CU cannot be divided into several PUs.
[0215] The reconstruction unit 158 can use transform blocks (for example, luma transform blocks, Cb and Cr) for CU's TUs and the predictive blocks (for example, luma blocks, Cb and Cr) for PUs. of CU, that is, either intraprediction data or interpretation data, as applicable, to reconstruct the
Petition 870190061136, of 7/1/2019, p. 106/139
100/107 coding blocks (for example, luma, Cb and Cr coding blocks) for CU. For example, reconstruction unit 158 can add samples of transform blocks (for example, luma transform blocks, Cb and Cr) to samples of the corresponding predictive blocks (for example, predictive luma blocks, Cb and Cr) for reconstruct the coding blocks (for example, luma, Cb and Cr coding blocks) of CU.
[0216] The filter unit 160 can perform an unlocking operation to reduce the blocking artifacts associated with the CU coding blocks. The video decoder 30 can store the encoding blocks of the CU nobuffer of decoded image 162. The decoded image buffer 162 can provide reference images for motion compensation, intraprediction, and subsequent presentation on a display device, such as a display device. display 32 of figure 1. For example, video decoder 30 can perform, on the basis of the decoded blocks in image memory 162, intrapredictive or interpreted operations for PUs of other CUs.
[0217] Figure 12 is a flowchart showing an example of the method for encoding disclosure. The techniques of figure 12 can be performed by a video encoder 22, including transform processing unit 104 and / or quantization unit 106.
[0218] In an example of the disclosure, video encoder 22 can be configured to receive a video data block (200), and to predict the video data block to create residual video data (202). Video encoder 22 can be additionally configured to
Petition 870190061136, of 7/1/2019, p. 107/139
101/107 determining a transform for residual video data, where the transform has a size S that is not a power of two (204), and rounding S to a power of two creating a transform with a modified size S '( 206). The video encoder 22 can also apply the transform with the modified size S 'to the residual video data to create transform coefficients (208), and encode the transform coefficients in an encoded video bit stream (210).
[0219] In another example, video encoder 22 can be configured to round S to a power closer to two.
[0220] In another example, video encoder 22 can be configured to quantize the transform coefficients.
[0221] In another example, the video data block has a non-square shape.
[0222] In another example, S is 12, and video encoder 22 can be configured to round 12 to 16, where the modified size S 'is 16. In another example, S is 24, and the encoder of video 22 can be configured to round 24 to 32, where the size
modified S 'is of 32. [0223] On a example, S is the width gives transformed. In other example, S is the height gives transformed.[0224] A figure 13 is a flow chart that shows one example of the method of decoding gives disclosure. At
The techniques in figure 13 can be performed by the video decoder 30, including the video
Petition 870190061136, of 7/1/2019, p. 108/139
102/107 inverse transform processing 156 and / or inverse quantization unit 154.
[0225] In an example of the disclosure, the video decoder 30 can be configured to receive an encoded block of video data (300), to determine a transform for the encoded block of video data, where the transform has a size S which is not a power of two (302), and rounding S to a power of two creating an inverse transform with a modified size S '(304). The video decoder 30 can be further configured to apply the reverse transform with the modified size S 'to the encoded block of the video data to create residual video data (306), and to decode the residual video data to create a decoded block video data (308).
[0226] In one example, a video decoder 30 can be configured to round S to a power
nearest in two. In other example, S is 12 and decoder in video 30 can be configured for round 12 to 16, on what O size modified S ' is 1 6. In
another example, S is 24 and video decoder 30 can be configured to round 24 to 32, where the modified size S 'is 32.
[0227] In another example, the encoded block of video data includes quantized reverse transform coefficients. In another example, the encoded block of the video data has a non-square shape. In one example, S is the width of the transform. In another example, S is the height of the transform.
[0228] In another example of disclosure, a
Petition 870190061136, of 7/1/2019, p. 109/139
103/107 video decoder 30 can be configured to determine the displacement values for the inverse transform based on S '[0229] Certain aspects of this disclosure have been described with respect to the extensions of the HEVC standard for purposes of illustration. However, the techniques described in this disclosure may be useful for other video encoding processes, including other standard or proprietary video encoding processes not yet developed.
[0230] A video encoder, as described in this disclosure, can refer to a video encoder or a video decoder. Likewise, a video encoding unit can refer to a video encoder or a video decoder. Likewise, video encoding can refer to video encoding or video decoding, as applicable. In this disclosure, the phrase based on may indicate based only on, based, at least in part, or based in some way on. This disclosure may use the term video unit or video block or block to refer to one or more sample blocks and syntax structures used for code samples of one or more sample blocks. Example types of video units can include CTUs, CUs, PUs, transform units (TUs), macroblocks, macroblock partitions, and so on. In some contexts, discussion of PUs can be interchanged with the discussion of macroblocks or macroblock partitions. Examples of types of video blocks can include encoding tree blocks,
Petition 870190061136, of 7/1/2019, p. 110/139
104/107 encoding, and other types of video data blocks.
[0231] It should be recognized that, depending on the example, certain acts or events of any of the techniques described here can be performed in a different sequence, added, merged, or left out (for example, not all acts or events described are necessary for the practice of the techniques). In addition, in certain examples, acts or events can be performed simultaneously, for example, through multitasking processing, interrupt processing, or multiple processors, instead of sequentially.
[0232] In one or more examples, the functions described can be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, functions can be stored or transmitted as one or more instructions or code in a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which correspond to a tangible medium, such as data storage media or media including any means that facilitates the transfer of a computer program from one place to another, for example. example, according to a communication protocol. In this way, computer-readable media can generally correspond to (1) tangible computer-readable storage media that is non-transitory or (2) a communication medium, such as a carrier wave or signal. Data storage media can be any available media that can be accessed by one or more
Petition 870190061136, of 7/1/2019, p. 111/139
105/107 computers or one or more processors to retrieve code instructions, and / or data structures for implementing the techniques described in this disclosure. A computer program product may include a computer-readable medium.
[0233] By way of example, and not by way of limitation, such computer-readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, memory flash, or any other means that can be used to store the desired program code in the form of instructions or data structures, and that can be accessed by a computer. In addition, any connection is correctly termed a computer-readable medium. For example, if instructions are transmitted from a website, server or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL) or wireless technologies such as infrared, radio and microwave , then coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio and microwave are included in the media definition. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transitory, tangible storage media . Disc and floppy disk, as used herein, include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disc and Blu disc
Petition 870190061136, of 7/1/2019, p. 112/139
106/107 ray, where diskettes generally reproduce data magnetically, while disks reproduce data optically with lasers. Combinations of the above should also be included in the scope of computer-readable media.
[0234] Instructions can be executed by one or more processors, such as one or more digital signal processors (DSP), general purpose microprocessors, application specific integrated circuits (ASIC), field programmable logic arrangements (FPGA) or another integrated equivalent or set of discrete logic circuits. Accordingly, the term processor as used herein can refer to any of the foregoing structure or any other structure suitable for applying the techniques described herein. In addition, in some examples, the functionality described here can be provided within hardware modules and / or dedicated software configured for encoding and decoding, or incorporated into a combined codec. In addition, the techniques can be fully implemented in one or more circuits or logic elements.
[0235] The techniques of this disclosure can be implemented in a wide variety of devices or devices, including a wireless telephone device, an integrated circuit (IC) or a set of ICs (for example, a chip set). Various components, modules or units are described in this disclosure to highlight the functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need to be carried out by different units of
Petition 870190061136, of 7/1/2019, p. 113/139
107/107 hardware. Instead, as described above, multiple units can be combined into one hardware codec unit or provided by a set of interoperable hardware units, including one or more processors, as described above, in conjunction with the software and / or firmware appropriate.
[0236] Several examples have been described. These and other examples are within the scope of the following claims.

权利要求:
Claims (8)
[1]
1. Method of decoding video data, the method comprising:
receiving an encoded block of video data;
determining a transform for the encoded block of video data, wherein the transform has a size S that is not a power of two;
round S to a power of two creating an inverse transform with a modified size S ';
applying the reverse transform with the modified size S 'to the encoded block of video data to create residual video data; and decoding the residual video data to create a decoded block of video data.
2 . Method according with The claim 1, in what round S for a power of two comprises round S for a power more close to two.3. Method according with The claim 1, in what the coded block of data from videc > includes coefficients ; in transformed inverse quantized. 4. Method according with The claim 1, in what the coded block of data in video has a form no square. 5. Method according with The claim 1, in what S is 12, and where round S for a power of two
comprises rounding 12 to 16, where the modified size S 'is 16.
6. Method according to claim 1, wherein
S is 24, and where rounding S to a power of two comprises rounding 24 to 32, where the modified size
Petition 870190061136, of 7/1/2019, p. 115/139
[2]
2/8
S 'is 32.
Method according to claim 1, wherein S is the width of the transform.
Method according to claim 1, wherein S is the height of the transform.
A method according to claim 1, which further comprises:
determine displacement values for the inverse transform based on S '
10. Video data encoding method, the method comprising:
receiving a video data block;
predicting the video data block to create residual video data;
determining a transform for residual video data, where the transform has a size S that is not a power of two;
round S to a power of two creating a transform with a modified size S ';
apply the transform with the modified size S 'to the residual video data to create transform coefficients; and encoding the transform coefficients into an encoded video bit stream.
A method according to claim 10, wherein rounding S to a power of two comprises rounding S to a power closer to two.
A method according to claim 10, further comprising:
quantify the transform coefficients.
Petition 870190061136, of 7/1/2019, p. 116/139
[3]
3/8
A method according to claim 10, wherein the video data block has a non-square shape.
A method according to claim 10, in which S is 12, and in which rounding S to a power of two comprises rounding 12 to 16, in which the modified size S 'is 16.
A method according to claim 10, wherein S is 24, and in which rounding S to a power of two comprises rounding 24 to 32, in which the modified size
S 'is in 32. in 16. Method in according to claim 10, what S is the width of transformed. 17. Method in according to claim 10, in what S is the height of the braid formed. 18. Device configured to decode the
video data, the apparatus comprising:
a memory configured to store the video data; and one or more processors in communication with the memory, the one or more processors configured to:
receiving an encoded block of video data;
determining a transform for the coded block of the video data, wherein the transform has a size S that is not a power of two;
round S to a power of two creating an inverse transform with a modified size S ';
applying the reverse transform with the modified size S 'to the coded block of the video data to create residual video data; and decode the residual video data for
Petition 870190061136, of 7/1/2019, p. 117/139
[4]
4/8 create a decoded block of video data.
An apparatus according to claim 18, wherein rounding S to a power of two, the one or more processors S are configured to round to a power closer to two.
An apparatus according to claim 18, wherein the encoded block of video data includes the quantized reverse transform coefficients.
Apparatus according to claim 18, wherein the encoded block of the video data has a non-square shape.
22. Apparatus according to claim 18, in that S is 12, and in which to round S for a power of two, one or more processors are configured to round 12 to 16, where the size modified S 'is 16. 23. Apparatus according to claim 18, in that S is 24, and in which to round S for a power of two, one or more processors are configured to round 24 to 32, where the size modified S 'is 32. 24. Apparatus according to claim 18, in that S is the width of the transform.25. Apparatus according to claim 18, in that S is the height of the transform.26. Apparatus according to claim 18, in that the one or more processor s are additionally
configured for:
determine the displacement values for the inverse transform based on the S '
An apparatus according to claim 18, further comprising:
Petition 870190061136, of 7/1/2019, p. 118/139
[5]
5/8 a display configured to display the decoded block of video data.
28. Apparatus configured to encode video data, the apparatus comprising:
a memory configured to store the video data; and one or more processors in communication with the memory, the one or more processors configured to:
receiving a video data block;
predicting the video data block to create residual video data;
determining a transform for residual video data, where the transform has a size S that is not a power of two;
round S to a power of two creating a transform with a modified size S ';
apply the transform with the modified size S 'to the residual video data to create transform coefficients; and encoding the transform coefficients into an encoded video bit stream.
An apparatus according to claim 28, wherein rounding S to a power of two, the one or more processors S are configured to round to a power closer to two.
Apparatus according to claim 28, wherein the one or more processors are additionally configured to:
quantify the transform coefficients.
31. Apparatus according to claim 28, in
Petition 870190061136, of 7/1/2019, p. 119/139
[6]
6/8 that the video data block has a non-square shape.
32. Apparatus according to claim 28, wherein S is 12, and in which to round S to a power of two, the one or more processors are configured to round 12 to 16, in which the modified size S 'is 16.
33. The apparatus of claim 28, wherein S is 24, and in which to round S to a power of two, the one or more processors are configured to round 24 to 32, in which the modified size S 'is 32.
Apparatus according to claim 28, wherein S is the width of the transform.
35. Apparatus according to claim 28, wherein S is the height of the transform.
36. The apparatus of claim 28, further comprising:
a camera configured to capture video data.
37. A device configured to decode video data, the device comprising:
means for receiving an encoded block of video data;
means for determining a transform for the encoded block of video data, wherein the transform has a size S that is not a power of two;
means for rounding S to a power of two creating an inverse transform with a modified size S ';
means for applying the reverse transform with the modified size S 'to the encoded block of video data to create residual video data; and
Petition 870190061136, of 7/1/2019, p. 120/139
[7]
7/8 means for decoding residual video data to create a block of decoded video data.
38. Apparatus configured to encode video data, the apparatus comprising:
means for receiving a video data block;
means for predicting the video data block for creating residual video data;
means for determining a transform for the residual video data, wherein the transform has a size S that is not a power of two;
means for rounding S to a power of two creating a transform with a modified size S 'r
means for applying the transform with the modified size S 'to the residual video data to create transform coefficients; and means for encoding the transform coefficients in an encoded video bit stream.
39. Computer-readable storage medium that stores instructions that, when executed, cause one or more processors from a device configured to decode video data:
receive a coded block of video data;
determine a transform for the coded block of the video data, in which the transform has a size S that is not a power of two;
round S to a power of two creating an inverse transform with a modified size S ';
apply the inverse transform with the size
Petition 870190061136, of 7/1/2019, p. 121/139
[8]
Modified 8/8 S 'to the encoded block of video data to create residual video data; and decode the residual video data to create a decoded block of the video data.
40. Computer-readable storage medium that stores instructions that, when executed, cause one or more processors on a configured device to encode video data to:
receive a block of video data;
predict the video data block to create residual video data;
determining a transform for the residual video data, where the transform has a size S that is not a power of two;
round S to a power of two creating a transform with a modified size S ';
apply the transform with the modified size S 'to the residual video data to create transform coefficients; and encoding the transform coefficients into an encoded video bit stream.

类似技术:

公开号 | 公开日 | 专利标题

BR112019013645A2|2020-01-21|multi-type tree structure for video encoding

AU2017207452B2|2020-08-27|Multi-type-tree framework for video coding

CA3026657A1|2018-01-18|Signaling of quantization information in non-quadtree-only partitioned video coding

BR112014011062B1|2021-06-29|METHOD AND APPARATUS SET TO ENCODE AND DECODE TRANSFORM COEFFICIENTS FOR A TRANSFORM BLOCK OF A VIDEO BLOCK AND COMPUTER-READABLE STORAGE MEDIA

BR112014011063B1|2021-07-20|CONTEXT REDUCTION FOR CONTEXT ADAPTIVE BINARY ARITHMETIC CODING

AU2018282523A1|2019-12-05|Intra filtering applied together with transform processing in video coding

EP3593531A1|2020-01-15|Intra filtering flag in video coding

BR112019014090A2|2020-02-04|intraprevision techniques for video encoding

BR112019013705A2|2020-04-28|temporal prediction of modified adaptive loop filter to support time scalability

BR112020019715A2|2021-02-09|combination of extended position-dependent intraprediction with angular modes

US10972758B2|2021-04-06|Multi-type-tree framework for transform in video coding

BR112021000352A2|2021-04-06|COMBINATION OF POSITION-DEPENDENT INTRAPREDITION WITH WIDE-ANGLE INTRAPREDITION

BR112019027071A2|2020-07-07|Improved intra-prediction in video coding

BR112021004492A2|2021-05-25|adaptive multiple transform coding

BR112021009714A2|2021-08-17|regular coded bin reduction for coefficient decoding using limit and rice parameter

EP3738314A1|2020-11-18|Multiple-model local illumination compensation

BR112019010547A2|2019-09-17|indication of using bilateral filter in video coding

OA18378A|2018-11-02|Contexts for large coding tree units.

同族专利:

公开号 | 公开日

TW201841501A|2018-11-16|

WO2018129322A1|2018-07-12|

TWI728220B|2021-05-21|

KR102292788B1|2021-08-24|

CN110073661A|2019-07-30|

CN110073661B|2021-09-14|

JP2020504506A|2020-02-06|

US20180199072A1|2018-07-12|

KR20190104032A|2019-09-05|

US10848788B2|2020-11-24|

EP3566439A1|2019-11-13|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

FR2926694B1|2008-01-18|2010-04-16|Sagem Comm|VIDEO DATA DECODER AND METHOD FOR DECODING VIDEO DATA|

US9110849B2|2009-04-15|2015-08-18|Qualcomm Incorporated|Computing even-sized discrete cosine transforms|

KR101487686B1|2009-08-14|2015-01-30|삼성전자주식회사|Method and apparatus for video encoding, and method and apparatus for video decoding|

JP5672678B2|2009-08-21|2015-02-18|Tdk株式会社|Electronic component and manufacturing method thereof|

CN102577393B|2009-10-20|2015-03-25|夏普株式会社|Moving image coding device, moving image decoding device, moving image coding/decoding system, moving image coding method and moving image decoding method|

KR101457396B1|2010-01-14|2014-11-03|삼성전자주식회사|Method and apparatus for video encoding using deblocking filtering, and method and apparatus for video decoding using the same|

KR20120090740A|2011-02-07|2012-08-17|휴맥스|Apparatuses and methods for encoding/decoding of video using filter in a precise unit|

JP5832519B2|2010-04-13|2015-12-16|サムスンエレクトロニクスカンパニーリミテッド|Video encoding method and apparatus based on encoding unit based on tree structure, and video decoding method and apparatus|

DK2559245T3|2010-04-13|2015-08-24|Ge Video Compression Llc|Video Coding using multitræsunderinddeling Images|

US20120170648A1|2011-01-05|2012-07-05|Qualcomm Incorporated|Frame splitting in video coding|

US9807424B2|2011-01-10|2017-10-31|Qualcomm Incorporated|Adaptive selection of region size for identification of samples in a transition zone for overlapped block motion compensation|

US8548057B2|2011-01-25|2013-10-01|Microsoft Corporation|Video coding redundancy reduction|

RU2603552C2|2011-06-24|2016-11-27|Сан Пэтент Траст|Image decoding method, image encoding method, image decoding device, image encoding device and image encoding and decoding device|

US9883203B2|2011-11-18|2018-01-30|Qualcomm Incorporated|Adaptive overlapped block motion compensation|

US9462275B2|2012-01-30|2016-10-04|Qualcomm Incorporated|Residual quad tree coding for video coding|

JP2013229674A|2012-04-24|2013-11-07|Sharp Corp|Image coding device, image decoding device, image coding method, image decoding method, image coding program, and image decoding program|

EP2951999A4|2013-01-30|2016-07-20|Intel Corp|Content adaptive parametric transforms for coding for next generation video|

GB2513111A|2013-04-08|2014-10-22|Sony Corp|Data encoding and decoding|

US9906813B2|2013-10-08|2018-02-27|Hfi Innovation Inc.|Method of view synthesis prediction in 3D video coding|

US10687079B2|2014-03-13|2020-06-16|Qualcomm Incorporated|Constrained depth intra mode coding for 3D video coding|

KR20170002460A|2014-06-11|2017-01-06|엘지전자 주식회사|Method and device for encodng and decoding video signal by using embedded block partitioning|

FR3029333A1|2014-11-27|2016-06-03|Orange|METHOD FOR ENCODING AND DECODING IMAGES, CORRESPONDING ENCODING AND DECODING DEVICE AND COMPUTER PROGRAMS|

WO2016090568A1|2014-12-10|2016-06-16|Mediatek Singapore Pte. Ltd.|Binary tree block partitioning structure|

WO2016154963A1|2015-04-01|2016-10-06|Mediatek Inc.|Methods for chroma coding in video codec|

US10200719B2|2015-11-25|2019-02-05|Qualcomm Incorporated|Modification of transform coefficients for non-square transform units in video coding|

AU2015261734A1|2015-11-30|2017-06-15|Canon Kabushiki Kaisha|Method, apparatus and system for encoding and decoding video data according to local luminance intensity|

US10212444B2|2016-01-15|2019-02-19|Qualcomm Incorporated|Multi-type-tree framework for video coding|

US11223852B2|2016-03-21|2022-01-11|Qualcomm Incorporated|Coding video data using a two-level multi-type-tree framework|EP3430808A4|2016-03-16|2020-01-15|Mediatek Inc.|Method and apparatus of video data processing with restricted block size in video coding|

US10609423B2|2016-09-07|2020-03-31|Qualcomm Incorporated|Tree-type coding for video coding|

EP3349455A1|2017-01-11|2018-07-18|Thomson Licensing|Method and device for coding a block of video data, method and device for decoding a block of video data|

JP6980920B2|2017-12-21|2021-12-15|エルジーエレクトロニクスインコーポレイティドLg Electronics Inc.|Video coding method based on selective conversion and its equipment|

KR20190081383A|2017-12-29|2019-07-09|인텔렉추얼디스커버리 주식회사|Video coding method and apparatus using sub-block level intra prediction|

WO2019185815A1|2018-03-29|2019-10-03|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Partitioning concepts for block-based picture coding|

US10972758B2|2018-04-02|2021-04-06|Qualcomm Incorporated|Multi-type-tree framework for transform in video coding|

EP3793197A4|2018-05-10|2022-02-16|Samsung Electronics Co Ltd|Image segmentation method and apparatus for image encoding and decoding|

WO2019234612A1|2018-06-05|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Partition tree with four sub-blocks symmetric or asymmetric|

US10887594B2|2018-07-05|2021-01-05|Mediatek Inc.|Entropy coding of coding units in image and video data|

WO2020035057A1|2018-08-16|2020-02-20|Mediatek Inc.|Methods and apparatuses of signaling quantization parameter in video processing system|

AU2018217336A1|2018-08-17|2020-03-05|Canon Kabushiki Kaisha|Method, apparatus and system for encoding and decoding a transformed block of video samples|

CN113170098A|2018-12-07|2021-07-23|华为技术有限公司|Constrained prediction modes for video coding|

CN113711615A|2019-02-15|2021-11-26|北京字节跳动网络技术有限公司|Non-quadratic partition tree in video compression|

WO2020171681A1|2019-02-19|2020-08-27|주식회사 윌러스표준기술연구소|Intra prediction-based video signal processing method and device|

WO2020182182A1|2019-03-12|2020-09-17|Beijing Bytedance Network Technology Co., Ltd.|Compound triple tree in video coding|

US20200304815A1|2019-03-22|2020-09-24|Tencent America LLC|Method and apparatus for video coding|

US11032543B2|2019-03-22|2021-06-08|Tencent America LLC|Method and apparatus for video coding|

US11190777B2|2019-06-30|2021-11-30|Tencent America LLC|Method and apparatus for video coding|

WO2021086151A1|2019-11-01|2021-05-06|엘지전자 주식회사|Transform-based image coding method and device for same|

WO2021086149A1|2019-11-01|2021-05-06|엘지전자 주식회사|Image coding method based on transform, and device therefor|

WO2021086152A1|2019-11-01|2021-05-06|엘지전자 주식회사|Transform-based method for coding image, and device therefor|

法律状态:
2021-10-13| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201762443569P| true| 2017-01-06|2017-01-06|

US15/862,203|US10848788B2|2017-01-06|2018-01-04|Multi-type-tree framework for video coding|

PCT/US2018/012589|WO2018129322A1|2017-01-06|2018-01-05|Multi-type-tree framework for video coding|

[返回顶部]